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Preface 



The papers in this volume were selected for presentation at the Fourth Annual 
International Computing and Combinatorics Conference (COCOON’98), held 
on August 12-14, 1998, in Taipei. The topics cover most aspects of theoretical 
computer science and combinatorics related to computing. 

Submissions to the conference this year was only conducted electronically. 
Thanks to the excellent software developed by the system team of the Institute of 
Information Science, we were able to make virtually all communications through 
the World Wide Web. 

A total of 69 papers was submitted in time to be considered, of which 36 
papers were accepted for presentation at the conference. In addition to these 
contributed papers, the conference also included four invited presentations by 
Christo Papadimitriou, Michael Fishcher, Fan Chung Graham and Rao Kosaraju. 
It is expected that most of the accepted papers will appear in a more complete 
form in scientific journals. Moreover, selected papers will appear in a special 
issue of Theoretical Computer Science. 

We thank all program committee members, their support staff and referees 
for excellent work within demanding time constraints. We thank all authors who 
submitted papers for consideration. We are especially grateful to our colleagues 
who worked hard and offered widely differing talents to make the conference 
both possible and enjoyable. 
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Algorithmic Approaches to Information 
Retrieval and Data Mining 



Christos H. Papadimitriou 
University of California Berkeley, USA 



Abstract. The emerging globalized information environment, with its 
unprecedented volume and diversity of information, is creating novel 
computational problems and is transforming established areas of research 
such as information retrieval. I believe that many of these problems are 
susceptible to rigorous modeling and principled analysis. In this talk I 
will focus on recent research which exemplifies the value of theoretical 
tools and approaches to these challenges. 

Reseaschers in information retrieval have recently shown the applicability 
of spectral techniques to resolving such stubborn problems as polysemy 
and synonymy. The value of these techniques has more recently been 
demonstrated rigorously in a PODS 98 paper co-authored with Ragha- 
van, Tanaki, and Vempala, by utilizing a formal probabilistic model of 
the corpus. Also in the same paper, a rigorous randomized simplification 
of the singular value decomposition process was proposed. In a paper 
in SODA 98, Kleinberg shows how spectral methods can extract in a 
striking way the semantics of a hypertext corpus, such as the world-wide 
web. 

Although data mining has been promising the extraction of interesting 
patterns from massive data, there has been very little theoretical discus- 
sion of what ’’interesting” means in this context. In a STOC 98 paper 
co-authored with Kleinberg and Raghavan, we argue that such a theory 
must necessarily take into account the optimization problem faced by 
the organization that is doing the data mining. This point of view leads 
quickly to many interesting and novel combinatorial problems, and some 
promising approximation algorithms, while leaving many challenging al- 
gorithmic problems wide open. 
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Combinatorial Problems Arising in Massive 

Data Sets 



Fan Chung Graham 

University of Pennsylvania 
Philadelphia, PA 19104 



Abstract. We will discuss several combinatorial problems which are 
motivated by computational issues arising from the study of large graphs 
(having sizes far exceeding the size of the main memory) . A variety of ar- 
eas in graph theory are involved, including graph labeling, graph embed- 
ding, graph decomposition, and spectral graph theory as well as concepts 
from probabilistic methods and dynamic location/scheduling. 
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Estimating Parameters of Monotone Boolean 

Functions 

(abstract) 



Michael J. Fischer 

Department of Computer Science, P.O. Box 208285, 

Yale University, New Haven CT 06520-8285, USA 
f ischer-michaelOcs . yale . edu 
http: //www. cs .yale .edu/'^f ischer/ 

Let F : {0, {0, 1} be a monotone Boolean function. For a vector 

X G ^ let ^{x) be the number of Us in cc, and let Sk = {x ^ {0,1}’^ | 

4j^{x) = k}. For a multiset set R C {0, define the F-density of R to be 

p/pA I e i? I F(®) = 1} I 

Thus, D[R) = prob[F(X) = 1], where X is uniformly distributed over F. 

Let Rk be a random multiset consisting of m independent samples drawn 
uniformly from Sk^ Then the random variable Yk = D[Rk) is a sufficient esti- 
mator for D[Sk)^ A naive algorithm to compute Yq, * * * ? evaluates F on each 
of the m{n F 1) random samples in |J^ Rk and then computes each D[Rk). 

The following theorem shows that the number of evaluations of F can be 
greatly reduced. 

Main Theorem. There is a randomized algorithm for computing Yor * * ? thcit 
performs at most m|~log 2 (n + 2)] evaduations of F . 

When n is large and F takes a considerable amount of time to evaluate, this 
theorem permits a dramatic decrease in the time to compute the Y^’s and hence 
to estimate the parameters D[Sk) to a given degree of accuracy. 

The problem of estimating the parameters D[Sk) arises in the study of error- 
correcting codes. The vector x G {0, represents a particular error pattern, 
and F[x) = 1 iff the error-correction algorithm fails to correct all errors. Many 
error correction algorithms are monotone in the sense that additional errors can 
never turn an uncorrectable error pattern into a correctable one. In this context, 
D[Sk) is the probability that a corrupted code word containing k independent 
bit errors is uncorrectable by the algorithm. Regarded as a function of /c, it is a 
useful measure of the overall error-correction ability of an algorithm. 
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De-amortization of Algorithms * 

Preliminary version 



S. Rao Kosaraju and Mihai Pop 



Department of Computer Science 
Johns Hopkins University 
Baltimore, Maryland 21218 



Abstract. De- amortization aims to convert algorithms with excellent 
overall speed, /(n) for performing n operations, into algorithms that 
take no more than 0(/(n) /n) steps for each operation. The paper reviews 
several existing techniques for de-amortization of algorithms. 



1 Introduction 

The worst case performance of an algorithm is measured by the maximum num- 
ber of steps performed by the algorithm in response to a single operation, while 
the amortized performance is given by the total number of steps performed in 
response to n operations. De-amortization seeks to convert an algorithm with 
/(n) amortized speed to another with close to worst case speed. 

Consider, for example, the standard pushdown in which the allowable single 
step primitives are push^ which pushes a single item on top of the pushdown, and 
pop, which removes the topmost item of the pushdown. The pushdown is initially 
empty, and it is not allowed to pop when it is empty. We want to implement 
the pushdown automaton with push^ specified above, and pop* which performs a 
sequence of pops until a specified condition is satisfied. Note that at any stage, 
the total number of pops performed cannot be more than the total number 
of pushes performed. Hence, if we implement pop* by a sequence of pops, the 
realization of a sequence of n operations composed of pushes and pop* can have 
at most n — 1 pushes and n — Ipops, resulting in an amortized speed of 2(n— 1). 
However, all the n — 1 pops will be performed on the last pop* if the sequence 
of operations is n — 1 pushes followed by a pop*. Thus, the worst case speed 
for implementing a single operation is n — 1, while the average number of steps 
performed by this algorithm is 2 — 2/n. The average amortized speed of this 
algorithm is 2 — 2/n. In de- amortization we seek an algorithm whose worst case 
single operation speed is 0(2 — 2/n). It can easily be shown that pushdowns 
cannot be de-amortized. However, when the pushdowns are allowed to have a 
limited "jumping” capability, de- amortization can be achieved ([13,18]). 

Many techniques were developed for achieving de- amortization of algorithms, 
and we review some of them. Even though we present them as distinct techniques, 
they are quite inter-related. 

Supported by NSF Grant CCR9508545 and ARO Grant DAAH04-96- 1-0013 



Wen-Lian Hsu, Ming- Yang Kao (Eds.): COCOON’98, LNCS 1449, pp. 4-14, 1998. 
(c) Springer- Verlag Berlin Heidelberg 1998 



De-amortization of Algorithms 



5 



In section 2 we present how data duplication can be used to achieve de- 
amortization. Section 3 discusses techniques that are based on partially or fully 
rebuilding a data-structure. Approaches that maintain global invariants in order 
to achieve worst-case performance guarantees are presented in section 4. In sec- 
tion 5 we examine the relationship between a certain class of pebble games and 
de- amort izat ion . 

2 Data Duplication 

The earliest application of de- amort izat ion is for converting a Turing tape with 
multiple heads, M, into 4\iring tapes with one head per tape, [11]. Each 
tape of tries to maintain a neighborhood of one of the heads of M , such that 
the neighborhoods are non-overlapping and together cover the complete tape of 
M. However, this process runs into difficulty when some head h of M tries to 
move out of the neighborhood maintained by the corresponding tape of Mb An 
amortized multi-tape single head per tape simulator of M, denoted by M, will 
stop the simulation and will redistribute the tape between h and its correspond- 
ing neighbor evenly between the two tapes and then will resume the simulation. 
Such a simulator M guarantees an amortized speed of 0(n) for simulating n 
steps of M . Fischer, et al. [11] de-amortize this simulator by keeping essentially 
two copies of M, one in the background and the other in the foreground. While 
the current simulation is being performed on the foreground copy, redistribution 
of data will be performed on the background copy. By the time the foreground 
copy needs redistribution, the background copy will be ready to come to the 
foreground, and the current foreground copy becomes the background copy. Fis- 
cher, et al. showed that such a de-amortization strategy achieves a worst case 
performance of 0(1) steps per operation of M - matching the average amortized 
speed. 

Baker [2] applies a similar data duplication technique for implementing a 
realtime mechanism for performing garbage collection in a list processing system. 
Continuous garbage collection is achieved by performing (worst case) constant 
number of steps after each CONS operation. 

The main idea is to keep two list spaces, one which is active (the from space) ^ 
and another one into which garbage collection will put the accessible nodes 
(the to space): During each CONS operation the algorithm performs a constant 
number of garbage collection steps, moving accessible nodes into the to space and 
keeping pointers to them in the from space. By correctly choosing the number 
of garbage collection operations performed during each CONS, it is guaranteed 
that eventually the to space will contain all accessible nodes, while the from 
space will contain contain garbage. At this point the roles of the two spaces can 
be exchanged. 

Gajewska, and Tarjan ([12]) present a nice adaptation of the data duplication 
technique while implementing double ended queues (deques) with heap order. 

This technique of maintaining duplicate copies of the data structure plays a 
significant role in many other strategies that will be covered in the subsequent 
sections. 
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3 Rebuilding a Data Structure 

Overmars [28] applies the data duplication technique to achieving dynamization 
of data structures for multi-dimensional search problems. The main idea is to 
start constructing in the background a new version of the data-structure once 
the original structure has reached a certain level of unbalance. The new structure 
is slowly constructed over a series of updates. A certain restriction on updates, 
designated as weak updates^ guarantees that the new structure is fully built 
before the original structure has become too unbalanced. Then the new structure 
becomes active and the old one is removed. An update is weak iff there exist 
constants a and k such that after an updates (n = size of data-structure) the 
query time doesn’t deteriorate by more than a k factor. For data-structures that 
allow weak updates, Overmars establishes the following worst case result. 

Theorem 1. Any data strueture S for some searehing problem, that permits 
weak deletions and weak insertions, ean he dynamized into a strueture sueh 
that: 



Qs^ = 0{Qs{n)) 

Us^ = 0{Usin)^Fs{n)/n) 

where Qs{n), Us{n), and Ps{n) are the average amortized speeds for performing 
queries, updates, and preproeessing in S, respeetively, and Qsa cind Us' are the 
worst ease speeds for performing queries and updates in , respeetively. 

This result is applied to quad trees and k-d trees obtaining 0(log^ n) insertion 
time and O(logn) deletion time, and to d-dimensional super B-trees yielding an 
insertion time of O(log^n) and a deletion time of 0(log^~^ n). 

Earlier, Bentley applied a similar technique for decomposable search prob- 
lems [3]. A search problem is called deeomposahle iff for any partition of the data 
set and any query object, the answer can be obtained in constant time from 
the answers to the queries on the blocks of the partition. The idea is to split 
the structure into a logarithmic number of bags of blocks of increasing sizes. 
Insertions and deletions may cause blocks to over- or underflow, in which case 
the block is either moved up in the bag list, by merging with another block of 
the same size, or is split into two smaller blocks that are inserted at a lower level 
in the list. It is shown that this technique yields amortized time bounds for the 
updates. Amortization is then removed by spreading the splitting and merging 
procedures over a sequence of updates. The increase in block sizes guarantees 
that at any time, in any bag, only one block is in the process of being built. 

The result is summarized in the following theorem: 

Theorem 2. Let g{n) be a smooth, nondeereasing integer funetion with 0 < 
g{n) < n. Given a data strueture S for a deeomposahle searehing problem PR, 
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there exists a strueture for solving PR dynamieally sueh that: 



Qs'{n) = 0{g{n))Qs{n) 




0{logn/ log[g [n))/ log n))Ps{n)/n when g[n) = i7(logn) 
0[g[n)n^^^^^^)Ps{n) /n otherwise 



otherwise 



This result is used to solve nearest neighbor searehing and eommon inters ee- 
tion of halfspaees in Qs'(n) = 0(^n logn), is'(n) = 0{logn), Ds{n) = 
O(Vrnogn); d- dimensional rang eounting and reetangle interseetion eounting 
in Qs{n) = n),ig(n) = 0(log“* n), L>s(n) = 0(log“*n). 



4 Global-Invariants 

In one instance of this technique, the data structure that needs to be 
de-amortized is first partitioned into many (non-constant) sub-structures which 
permit a restricted set of operations. These sub-structures are of varying sizes, 
and the size of any sub-structure will be upperbounded by an appropriate func- 
tion of the sizes of the smaller sub-structures. 

The first problem that is solved by this technique is the step-for-step simula- 
tion of concatenable deques by deques (without concatenation) [19]. A eoneaten- 
able deque is a deque machine, with a constant number of deques, in which one 
allows as a primitive the operation of concatenation of any two deques. The main 
result of the paper is that every operation of a concatenable deque machine 
can be simulated within 0(1) steps by an 0{k) deque. 

Leong and Seiferas ([23]) earlier established that a restricted stack of deque 
machine (with a non-constant number of deques) can be simulated step-for-step 
by a deque machine. Kosaraju [19] first simulates step-for-step any concatenable 
deque machine by a stack of deques. This is achieved by partitioning each deque 
into many subdeques, then viewing all the resulting subdeques as a stack of 
deques, and finally specifying a global upperbound constraint for the size of 
any subdeque in the stack as a function of the sizes of the smaller subdeques 
in the stack. The global constraints on deque lengths guarantee that during 
the simulation access to the stack of deques is limited to the top. It is then 
shown that a step-for-step simulation of the concatenable deque machine can 
be performed by the stack of deques while maintaining the global constraints. 
Finally, the invocation of the result of [23] establishes the main result. 

A second type of global constraints is explored in [20] [21]. The paper [20] 
shows how to perform search-tree operations in O(logd) steps, where d is the 
difference in ranks of the previous searched element and the currently sought el- 
ement. The idea is to keep the keys in a set of balanced search trees of increasing 
heights. Then a global constraint on the number of search trees of height no more 
than h is imposed for every h. It is shown that the maintenance of these con- 
straints assures 0(log d) step worst case performance, and that the constraints 
can be maintained dynamically. 
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The paper [21] translates this technique into a pebble game on a set of bins 
numbered 1 through n. A move can remove two pebbles from bin i and add 
a pebble in bin i + 1, or remove one pebble from bin i and add two pebbles 
into bin i — 1. The goal of the game is to allow insert pebble and delete pebble 
operations on any bin. If a bin is empty then a sequence of moves needs to be 
performed in order to allow a delete from that bin. The complexity is measured 
by the number of moves necessary. The paper shows how to perform any insert 
or delete operation in bin k with at most 2A: T 1 moves. 

Another type of global invariants is explored in [22]. The paper contains an 
implementation of generalized deques that allow the normal deque operations, 
concatenation and find min. The implementation is on a random access machine 
(RAM). The implementation achieves all the above deque operations in 0(1) 
RAM steps in the worst case. The idea is to maintain a tree of deques which 
satisfies certain global invariants. It is shown that updates can be spread over a 
series of subsequent operations while preserving the invariants. 

A similar technique is applied in [9] to obtain worst case performance guaran- 
tees in relaxed heaps. Their data-structure is a priority queue that supports the 
operations decrease.key and delete.min in 0(1) and O(logn) time respectively. 
Insertions are supported in constant time. The authors first show how they can 
obtain these bounds in amortized time by relaxing the heap order constraint 
at a logarithmic number of nodes (called ’’active nodes” ), none of which are 
siblings. They further relax this invariant by allowing the active nodes to be sib- 
lings, in order to obtain the worst case bound. For both cases the paper presents 
transformations that preserve the invariant and yield the specified time bounds. 

A similar technique, called data struetural bootstrapping was introduced in 
[4,5] . The main idea is to allow elements of a data-structure to be smaller versions 
of the same data-structure. In [4], the authors create concatenable heap ordered 
deques by using heap ordered deques. Concatenation is realized by allowing ele- 
ments of a deque to point to other deques. The resulting tree-like structure can 
perform all operations in 0(1) worst case if only a constant number of concate- 
nations are allowed. This variation of data structural bootstrapping is called by 
the authors struetural abstraetion. In [5], the authors introduce a new variant 
of data structural bootstrapping, called struetural deeomposition. They address 
the problem of making deques confluently persistent. They represent deques as 
balanced trees. The trees are, in turn, decomposed into spines (i.e. extremal 
paths from a node to a leaf). The spines are recursively represented as deques. 
This approach yields a 0(log*n) worst case bound for the n-th operation on 
the data-structure. This is due to the fact that each operation needs to traverse 
the whole recursive structure. An improvement is presented in [16] by using a 
technique called reeursive slow-down. This technique avoids the need to recurse 
at each step. The authors insure that two operations need to be performed at 
a level of the recursive structure before the next level structure needs to be ac- 
cessed. They achieve this, in a manner similar to binary counting, by assigning 
each level a color (yellow, green, or red) which represents its unbalancedness. 
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Green levels are good, red levels are bad, and yellow levels are intermediate. A 
red level can be made green with the cost of having to change the color of the 
next level from green to yellow, or yellow to red. They maintain as invariant that 
red and green colors alternate among levels. This technique allows the authors 
to improve the results of [5] so that each operation can be done in 0(1) worst 
case time. 

In [10], Driscoll et al. examine the problem of making data- structures per- 
sistent. Relative to de-amortization they show how balanced search trees can be 
made fully persistent with O(logn) insertion and deletion time and 0(1) space 
in the worst case. Their approach is also based on maintaining a global invari- 
ant. Change records for the nodes of the tree are not necessarily located at the 
modified node, but somewhere on an access path from the root to the node, 
called a displacement path. As invariants, they require that all change records 
have displacement paths, for any change record, the nodes on its displacement 
path are not its version-tree descendents, and that the displacement paths for 
the version-tree ancestors of a node, that are also on its displacement path, are 
disjoint from the node’s displacement path. The authors show how to maintain 
these invariants during update operations. 

Willard and Lueker [32] show how to extend dynamic data structures to allow 
range restriction capabilities with only a logarithmic blowup in space, update 
time and query time. Their structure is a balanced binary tree on the range 
components, augmented at each node v with a version of the original structure 
[AUX{v)) for the elements stored in the subtree rooted at v. The amortized 
analysis requires rebuilding of the AUX[v) structures whenever the range tree 
needs rebalancing. For the worst case analysis they allow incomplete AU X fields 
to exist within the tree while being slowly completed over a series of updates. 
They preserve the invariants that each incomplete node has complete children 
and the tree is balanced. It is shown that even though the rebalancing procedure 
cannot be performed at certain nodes in order to preserve the first invariant, the 
tree does not become unbalanced, and a constant number of operations per 
update are sufficient to maintain the invariants. 

5 Pebble Games and Applications 

A set of pebble games have been used to design worst case algorithms for a series 
of problems. The pebble games involve a set of piles of pebbles. Two players take 
turns at adding or removing pebbles from the piles. The increaser (player I) can 
add pebbles to several piles while the decreaser (player D) can remove pebbles 
from the piles. 

A first game [8] allows player I to split a pebble into fractional pebbles and 
to add them to some of the piles. Player can remove all the contents of a 
single pile. It is shown that if D always removes all the pebbles in the largest 
pile, the size of any pile is bounded by the harmonic number where n is the 
number of piles. 

This result is applied to the following problem. 
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Design a data-structure that supports the following operations in constant 
time. 

— insert (x,y) - insert record y after record x 

— delete (x) - delete record x 

— order(x,y) - return true if x is before y and false otherwise. 

The data-structure proposed in [8] is a 4 level tree in which each internal 
node has a fan-out of O(logn) elements. The root has 0(n/ log^ n) children. The 
children of the root are stored in a data-structure due to Willard [30], [31] that 
allows insertion, deletion, and order operations to be performed in O(log^n). 
The order of two nodes is determined by finding their nearest common ancestor 
[nca) and comparing the order of the children of nca that are ancestors of the 
two nodes. 

Whenever insertions cause an internal node to have too many descendants 
(more than log^ n where h is the height of the node) the node is split into two 
nodes which are inserted in the parent. This operation is performed over logn 
insertions so that the time per update is 0(1). 

The pebble game is used to guarantee that at the root, insertions in the 
data-structure can be done slowly over log^ n updates, without the need to split 
a depth 1 node (child of the root) before a previous split has been completed. The 
algorithm anticipates the split of a level 1 node by inserting overflow nodes into 
the root structure whenever the node is close to becoming full (has almost log^ n 
descendants). We define the fullness of the node to be the difference between 
the size of the subtree rooted at the node and 2/31og^n. The connection with 
the pebble game becomes apparent. The chosen node is always the one with the 
largest value for fullness. Due to the result of the game, the fullness increase of 
any node is bounded by 0(log n) for each insertion. Thus, over log^ n operations, 
the fullness of any node cannot increase by more than O(log^n), implying that 
the overflow node has been inserted into the root by the time a node needs to 
be split. 

Raman [29] uses the same game to insure worst case performance in dynamic 
two-level data-structures. His approach starts with a data-structure organized as 
a collection of 0(n/s(n)) buckets of size 0(s(n)). If a bucket of size k can be cre- 
ated (deleted) in partitioned (fused with another bucket) in Tp[k) and an 

element can be inserted in Tb{k)^ he shows how the structure can be dynamized 
to allow updates to be performed in 0{ -\-Tij(^slogn)) worst 

case time where s is the bucket size of the static structure. 

The main idea is to split or fuse buckets that are either too big or too small. 
The criticality of buckets is defined as a value that reflects the largeness or 
smallness of the bucket. The value of the criticality of a bucket plays the role of 
the pebbles in the above zeroing game. This pebble game insures that buckets 
will not exceed O(slogn) in size when splits and fuses are spread over 0(s) 
operations, thus obtaining the time bounds claimed above. As an application of 
this result, the author shows how to construct a Anger search tree with constant 
update time. Previous results showed only amortized constant time bounds [15]. 
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As another application of the zeroing pebble game, Raman [29] presents a 
data-structure for dynamic fractional cascading [6,7,25] that supports insertions 
and deletions in worst case 0(log d + loglog s), where s is the space requirement 
and d is the degree. Queries are supported in 0(t(log log log s) + log s), where 
t is the path length. The pebble game is used to guarantee that blocks are of 
bounded size in the context of updates, thus allowing efficient transitions across 
the edges of the graph. 

Raman [29] introduces a pebble game played on bounded in-degree digraphs. 
In this game player i can add a pebble to some vertex or modify the connectivity 
of the graph, while preserving the degree constraints. Player D can remove all 
pebbles from a constant number of vertices, at the same time adding one pebble 
to each predecessor of each processed vertex. It is shown that player 12 has a 
strategy which guarantees that no vertex can have more than O(logn) pebbles 
for any constant degree bound. Moreover, the strategy can be implemented in 
constant time. 

This result is applied to the problem of making bounded in-degree data- 
structures partially persistent. The main result is 

Theorem 3. ([29]). Data structures where the in- degree of any node is at most 
0{log^n) after n operations on an initially empty data structure, for some con- 
stant c, can, be made persistent with 0(1) worst case slowdown on a RAM. 

The idea of the persistent data-structure is to keep at each node a set of 
versions of the data in the node. Nodes that are too large are slowly split over a 
series of updates, however the size of the largest node will remain bounded due 
to the result shown for the pebble game. 

As an application, the author shows a data-structure that performs partially 
persistent set union in 0(log n/ log log n) worst case time and 0(no;(m, n)) space 
on a RAM. 

Raman [29] also introduces a variant of the zeroing game in which player D 
is allowed to remove at most a constant number of pebbles from a single pile. 
He shows that if D always chooses the largest pile and removes as many pebbles 
as possible from it, the size of the largest pile will be bounded by Inn + 1, even 
if only one pebble can be removed at a time. 

This result is applied to making bounded in-degree data-structures fully per- 
sistent. The author shows how a data structure with degree bound d can be 
made fully persistent with 0(loglogm + logd) worst case slowdown for access 
and update steps, and 0(1) worst case space per update. 

Unlike the partial persistence case, in the case of full persistence it is not 
possible to always split a node whenever it becomes too large. The algorithm 
picks the largest node and transfers 0(d) version records to a new node and 
then restores pointer invariants. It is now clear that the decrementing game will 
be used, instead of the zeroing game, to guarantee that the number of versions 
stored at each node is bounded by O(dlogm) after m update steps. 

Levcopulos and Overmars [24] use a version of a pebble game in which player 
I can add a total of k pebbles to some piles, and player D can split a pile in 
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half. If D always splits the larger pile, the authors show that the size of any pile 
is bounded by 4/^logn. 

This game is applied to a data-structure for answering member and neighbor 
queries in O(logn) time and that allows for 0(1) time updates once the posi- 
tion of the element is known. Previous solutions could achieve only amortized 
constant time bounds for updates [14,15,27]. 

The data-structure is a search tree that stores at most O(log^n) elements 
in each leaf. After every log n insertions, the largest leaf is split into two equal 
halves. Due to the result proved for the game, the size of the largest leaf is 
bounded by 0(log^ n) (log nouer log n updates). The split and the insertion of 
the new leaf in the tree are performed over log n updates during which time no 
leaves are split (splits occur only every logn updates). Thus the size invariant is 
preserved and each insertion is performed in constant time. Deletions are handled 
by ’’global rebuilding” as described in [28]. 

6 Conclusions 

The paper reviewed several known techniques for de-amortization of algorithms. 
The techniques ranged from simple data duplication to elaborate pebble games 
played on graphs. In spite of the existence of these techniques, it is extremely 
hard to de-amortize new algorithms. A systematic study of de-amortization will 
be of great utility. 
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1 Introduction 

Given a planar point set S. a triangulation of S is a maximal set of non- 
intersecting edges connecting points in S, Triangulating a point set has many 
applications in computational geometry and other related fields. Specifically, in 
numerical solutions for scientific and engineering applications, poorly shaped tri- 
angles can cause serious difficulty. Traditionally, triangulations which minimize 
the maximum angle, maximize the minimum angle, minimize the maximum edge 
length, and maximize the minimum hight are considered. For example, if angles 
of triangles become too large, the discretization error in the finite element so- 
lution is increased and. if the angles become too small, the condition number 
of the element matrix is increased [1. 10. 11]. Polynomial time algorithms have 
been developed in determining those triangulations [2. 7. 8 . 15]. In computa- 
tional geometry another important research object is to compute the minimum 
weight triangulation. The weight of a triangulation is defined to be the sum of 
the Euclidean lengths of the edges in the triangulation. Despite the intensive 
study made during the lase two decades, it remains unknown that whether the 
minimum weight triangulation problem is NP-complete or polynomially solvable. 
In this paper we consider two new classes of optimal triangulations : 

Problem (1) the minimum weight triangulation with the minimum (resp. max- 
imum ) angle in the triangulation not smaller (resp. greater) than a given 
value a (resp. 7 ); 

Problem (2) the triangulation which minimizes the sum over all triangles of 
ratios defined by the values of the maximum angles to the minimum angles 
in the triangles. 

If the value of a is zero and 7 is equal to tt. then Problem (1) is reduced to the 
minimum weight triangulation problem. If a is defined as the maximum value of 
the minimum angles among all possible triangulations, the solution of Problem 

(1) may give the Delaunay triangulation, although this case is not equivalent to 
the Delaunay triangulation problem. Therefore. Problem (1) contains the mini- 
mum weight triangulation as a special instance. We identify Problem ( 1 ) as the 
minimum weight triangulation problem with angular constraints and Problem 

(2) as the angular balanced triangulation problem. In Problem (1) we proposed 
somewhat more general criteria of the minimum weight triangulation and in 
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Problem (2) a new criterion wliicli is different from the min max angle criterion. 
Altliongli no evidences of applications of these new criteria have been foniid. we 
believe that they slionkl be potentially nsefnb since the angular conditions on 
the angles in the minimnm weight triangnlation allows one to control the quality 
of the triangnlation generated, and the sum of the ratios of the value over all 
triangles in the angnlar balanced triangnlation contains more information than 
the minmax or the maxmin angle criterion does. 

The main purpose of the paper is to provide solution methods for comput- 
ing the optimal triangnlations dehned above. The difficnlty of determining the 
optimal triangnlation depends on the position of the points in the given set. If 
the points are vertices of a simple polygon then the problems are easily solvable. 
Actually the dynamic programming approach can be applied to both classes and 
provides polynomial time algorithms. For general point sets, the apparent dif- 
hcnlt to compute the minimnm weight triangnlation problem means that it is 
unlikely that we can design a polynomial algorithm for the min-snm type prob- 
lems at the moment. Moreover due to the angnlar conditions we can not expect 
the heuristic methods such as edge-flipping and greedy methods work well for 
these new classes. 

On the other hand, recent research has revealed promising ways to determine 
large subgraphs, namely, the ,d-skeleton [12. 13] and the LMT-skeleton [4. 5. 
6] of the minimnm weight triangnlation. The experimental results show that 
these subgraphs are well connected for most of the point sets having relatively 
small sizes that are generated from nniform random distributions. Therefore they 
are useful for the design of algorithms to compute the exact minimnm weight 
triangnlation of point sets having small sizes. When the number of connected 
components is small, a complete exact minimnm weight triangnlation can be 
produced by using the O(n^) dynamic algorithm [14] on each possible polygon 
which is reduced by the subgraph. 

Unfortnnately. the dehnition of the /4-skeleton relies heavily on the distances 
of pairs of the points, it is not applicable to the new problems which involve 
angnlar conditions. However, since the main idea of the LMT-skeleton for the 
minimnm weight triangnlation is the determination of the edges in every locally 
minimal triangnlation. it suggests that there is room left to generalize the concept 
to the new classes of optimal triangnlations throngh an appropriate dehnition of 
local optimality. This motivates ns to design a generalized nnifying method for 
the compntation of subgraphs for other classes of optimal triangnlations. Onr 
new resnlts are as follows. 

— O(n^) time and O(n^) space algorithms for the problems in each class with 
the point set being a vertex set of a simple polygon; 

— O(n^) time and O(n^) space algorithms for computing the subgraphs of 
the minimnm weight triangnlation with angnlar constraints and the angnlar 
balanced triangnlation; 

— the compntational resnlts for the two algorithms which demonstrate their 
nsefnlness. 

The organization of this paper is as follows. Section 2 gives polynomial time 
algorithm based on dynamic programming approach for computing the optimal 
triangnlations dehned above with the point set being a vertex set of a simple 
polygon. Section 3 presents the algorithm for the compntation of the subgraph of 
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the minimum weight triaiigulation with angular constraints. Section 4 introduces 
the algorithm for the deteriiiiiiation of the subgraph of the angular balanced 
triaiigulation. Section 5 states the conclusions. 



2 Polynomial Time Algorithms 

In this section we confine ourselves to the point set which is a vertex set of 
a simple polygon. We give polynomial time algorithms based on the dyiiamic 
programmiiig approach for determiiiiiig the niiiiimum weight triaiigulation with 
angular constraints and the angular balanced triaiigulation. 

Ill Bern and Eppsteiii [3] they discussed a class of optimal triaiigulatioii prob- 
lems which admit efficient solutions. The class possesses so called decomposable 
measures which allow one to compute the measure of the entire triaiigulation 
quickly from the measures of two pieces of the triaiigulation. along with the 
knowledge of how the pieces are put together. The decomposable measures in- 
clude the minimum (maximum) angle in the triaiigulation. the minimum (max- 
imum) circumcircle of a triangle, the minimum (maximum) length of an edge 
ill the triaiigulation. the minimum (maximum) area of a triangle, and the sum 
of edge lengths in the triaiigulation. They also presented polynomial time al- 
gorithms which use the dynamic programmiiig approach attributed to Kliiicsek 
[14]. We will add the problems of determiiiiiig the niiiiimum weight triaiigula- 
tioiis with angular constraints and the angular balanced triaiigulation to the 
decomposable class and present polynomial time algorithms for solving these 
problems. 

The Minimum Weight Triangulation with Angular Constraints: 

Denote the minimum weight triaiigulation of a point set S with respect to a and 
7 by Af TIT (5. a. 7 ). A triangle is defined admissible if it satisfies the angular 
conditions, otherwise it is iiiadmissible. Label the vertices ••• Tb? of the 

simple polygon in the clockwise order. The polygon defined by the point set S is 
called P. All edge pipj (j > i + 1) is said to be interior to P if the line segment 
CO 11 11 ec ting pi and pj splits P into two polygons whose uiiioii is P. An interior 
edge is called a diagonal edge of P. Let u’(i. j) be the weight of the minimum 
weight triaiigulation of the polygon involving the points ^7 27 + 1 . . . . ppj. However. 
if pip j is not a diagonal edge of the polygon S define u’(i. j) = +oc>. The algorithm 
for computing the minimum weight triaiigulation ALTTP(5. a. 7 ) is as follows. 

Algorithm AfTTP(5. a. 7 ): 

Step 1. For fc = 1+/ = 1. 2. ... +n — 1 and j — i + fc. let u;(i. j) = d(pippj'), where 
4Pi:Pj) the length of edge piPj. 

Step 2. Let k — fc + 1. For i = 1. 2 . . . +n and j — i + fc < n. if the edge pipj is 
not interior to P let u;(i. j) = +oc>. Otherwise let 

M — {m| i < rn < jppiPmPj is admissible and both the edge 
PiPrn and the edge p„,pj are diagonal} 



J d(pippj') + niiiq„+j\,f{'^tKb m) + u;(m. j)} for M 7^ 0 
[ +oc> otherwise. 



Compute 

Mhj) 



( 1 ) 
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For each pair (i.j) such that w(i,j) < oc>, let rrv(i,j') be the index where 
the minimum in (1) is achieved. 

Step 3* If k < n — 1. go to Step 2. Otherwise iv(l. n) is the minimum weight. 
Step 4* If 'w(l.n) < oc>, then backtrace along the pointers m" to determine 
the edges of the niinimum weight triangulation. Otherwise no triaiigulation 
satisfying the angular conditions exists. 

For each pair (b j) whether the edge p^pj is interior to P can be decided by the 
calculation of the intersection of the edge pipj with 0(n) boundary edges of P. 
More precisely, if there is intersection then the edge pipj is not interior to P. 
otherwise it is interior to P. We may calculate d(pippj') for each pair (bj) and 
store the information at the begining of the algorithm. 0(n) testing time for 
each edge shows that the interior test for all edges pipj needs O(n^) time and 
O(ri^) space. Since the admissibility test of a triangle takes constant time. Step 
2 requres 0(kn) time for each fc = ...,n - 2. Therefore Steps P3 take O(n^) 

time. In Step 4 the deterniination of the edges in niinimum weight triangulation 
can be done in 0(n) time, therefore the algorithm runs in a total 0{n^) time 
and O(n^) space. 

The Angular Balanced Triangulation: 

Let t be an arbitrary triangle in some triangulation T. Let (resp. 

be the largest (resp. smallest) angle in the triangle. Recall that the measure 
function /(t) of the triangle t is dehned by /(t) = ^lar^^e/^small* Specihcally for 
a degenerate t. i.e.. t is an edge we dehne f(t') = +oc>. For any two nondegenerate 
triangles t and tb let ) = f(t) + /(tQ. The sum of the ratios of the trian- 
gulation T, denoted by /(P), k dehned as f(T) = f(^)- Therefore, the 

angular balanced triangulation is the triangulation which minimizes the value 
/(P) over all triangulations P. 

We denote by P(b ;) the polygon formed by points i+u-,Pj- Let F(iJ) 
be the minimum value of f(Tij) over all triangulations py of P(bj). Dehne 
p(L j) = +oc> for each boundary and non-diagonal edg^e pipj of the polygon P. 
We compute P(l,n) by the dynamic programming method. Suppose that an 
diagonal edge PaPb splits P(bj) into two polygons Pi and P 2 . Let Ti and P 2 
be the triangulations of Pi and P 2 , respectively and Tij = Pi U P 2 . We dehne a 
function g as follows. 

gif(Ti)J(T2haS) = f(Ti) + f(T2) = fm/)- 

If the edge papb is on the boundary of the polygon P, then papb is an edge of 
the polygon P(bj). We dehne in this case 

Note that in any triangulation of P(i, j), pipj must be a side of a triangle, say 
pipjpm. with i < rn < j. We can compute the value of the angular balanced 
triangulation of P(b j) by trying all choices of rn. The algorithm for computing 
the angular balanced triangulation can be obtained by replacing w(i. j') with 
P(bj) and Step 2 with the following Step 2'' in Algorithm MWT(S.a.j). 
Sept 2b let k — k I- For i = 1, 2 . . . , n and j — i + b < n. Compute 

Pihj) = .^i^^.gigifiPiPjPrn),Fiip>n),Pi,Prn),FimJ),Prn,Pjy) (2) 

i<m<j 
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For each pair (i.j) let rrv(i,j') be the index where F(i,j) in ( 2 ) is achieved. 

Note that each evalnation of g takes constant time per possible valne of m. 
therefore a total 0 (kn) time in Sept 2 \ Following a similar analysis of Algo- 
rithm MWT(S.a.j) one can show that this algorithm also takes O(n^) time 
and space. 



3 The Subgraph of the Minimum Weight Triangulation 
with Angular Constraints 



In this section we present the algorithm for computing the subgraph of the 
minimnm weight triangnlation with angular constraints for point set in general 
position. Designate the set of all possible edges connecting two points in S by 
E(S). We assume that the values of a and 7 are given so that there are al- 
ways triangnlations satisfying the angular constraints. In order to determine the 
minimnm weight triangnlation MWT(S. a. 7 ). we only need to consider triangn- 
lations which contain no inadmissible triangles. For the entirety of onr discussion 
in this section, the term triangnlation will mean a triangnlation which consists 
solely of admissible triangles. 

Let e be an edge in an arbitrary triangnlation. If e is not on the boundary 
of the convex hull of the set 5. then there exist two admissible triangles ti 
and t 2 such that ti f] t 2 — e. If the qnadrilateral ti U t 2 is convex, then it 
contains another diagonal, e/. Denote by t[ and ^2 ^s the two triangles formed 
by connecting the edge eb The edge e is dehned to be locally minimal with 
respect to (ay 7) if either one of the following two cases holds: Case 1: N U ^2 is 
not convex; Case 2 : tiUt 2 is convex and either (i) |e| < |e^|. or (ii) at least one of 
the triangles t[ and ^2 is inadmissible. A triangnlation is called locally minimal 
with respect to (a. 7) if each edge e in the triangnlation is locally minimal 
with respect to the two triangles containing e and (a. 7). From the dehnition. 
it follows that MTFT(5. a.7) is a locally minimal triangnlation with respect to 
(a. 7). The intersection of all locally minimal triangnlations must be a subgraph 
of Af TIT (5. a. 7). Onr algorithm intends to hnd a subgraph of the intersection. 

We dehne a triangle as empty if it contains no points of S except the vertices. 
For each edge and empty triangle if they are known not to be contained in 
any locally minimal triangnlation we call them dead. Therefore all inadmissible 
triangles are dead initially. When each edge is examined by the algorithm, its 
status will be determined as active, inactive, or dead as follows. Let T be the 
set of pairs {axh. ayh\ of empty active triangles, one from each side of edge ab 
such that axb H ayb — ab. The edge ab is labeled active if it lies on the boundary 
of the convex hull of S. or if there exists {axb. ayb\ G T such that ab f 1 xy 7 ^ 0 
and e is locally minimal with respect to {axb. ayb} and (a. 7). 

Suppose that ab is not labeled active. Then, if T = 0. or ab f] xy 7 ^ 0 for all 
{axb. ayb} G T. we label ab dead. Otherwise, we label ab inactive. If an edge ab 
becomes inactive or dead, we label some of the admissible triangles bonnding ab 
as dead. More precisely, dehne the set A as the collection of the empty admissible 
active triangles axb which satisfy: 

axb n ayb — ab. ab H xy 7 ^ 0 for all empty admissible active triangles ayb 

such that x. y are on different sides of ab. respectively. 
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We label all triangles in the set A dead. 

We present below the algorithm. It produces a set of edges which is called the 
LMT"skeleton of the iiiiniiiiniii weight triangnlation with angular constraints and 
denote it by LMT"skeleton(5. a. 7 ). Three edge sets are used in the algorithm; 
they are candEdges, edgein and deadEdges. All edges in E(S) are initially active. 

We note additionally that ( 1 ) all edges in E(S). except the convex hull edges, 
are in candEdges. (2) edgein contains the convex hnll edges and (3) deadEdges 
is empty. 

The Algorithm LMT(5. a. 7 ): 

Input: point set 5. 

Output: edge set edgein. 

Step 0. Set all edges of candEdges nnexamined. 

Step 1. If there are no nnexamined edges, go to Step 4. Otherwise choose an 
nnexamined edge e G candEdges. check all empty triangles on both sides of 
e. If they are all inadmissible then delete e from candEdges and move it to 
d ea dEdges. O t he r wise . 

Step 2. Find all combinations of empty admissible triangles and tj on the 
two sides of e such that ti and tj are not bordered by an edge in deadEdges. 
Step 3. For each combination of ti and tj. test if e is locally minimal with 
respect to Ey tj and (a. 7 ). If e is not locally minimal to any such pair and 
tj. then move e to deadEdges. Otherwise, mark e active or inactive according 
to the dehnitions. Go to Step 1. 

Step 4. For each edge marked active or inactive, if it intersects no other active 
edges then move it to edgein. 

The algorithm iterates Steps 1-3 for O(n^) times, once for each candidate 
edge. Computing the empty triangles for each edge requires 0(n log n) time by 
using the method of [4]. We can also p reprocess the data in O(iA) time and 
O(n^) space by the algorithm in [9] so that all empty triangles sharing an edge 
can be computed in linear time. The admissibility for each empty triangle can 
be tested in 0(1) time. So Step 1 needs at most 0(tA) time. To test if an edge is 
locally angular balanced can be done in 0(1) time. Since e might have a linear 
number of triangles on each side, we may test O(n^) combinations of adjacent 
triangles. Step 4 tests an edge against at most O(rA) other edges and there are 
at most O(rA) edges. Tims, the Algorithm LMT(5. a. 7 ) rnns in O(rA) time and 
O(rA) space. 

The following lemmas guarantee the correctness of the algorithm. 

Lemma 1* If an empty admissible triangle t is labeled dead, then t ^ MWT(S. ay 7 ). 

Proof. We prove the lemma by contradiction. Let axb be the hrst triangle in 
MWT(S. a.g) that it is labeled dead becanse ab becomes either inactive or 
dead. Since ab is not an edge of the convex hull, there exists a triangle ayh G 
MWT(S.a.g) snch that axb f] ayh — ah. Since axb is the hrst triangle be- 
coming dead, both axb. ayh are active immediately after labeling ah inactive or 
dead. Labeling axb dead implies that ah f] xy A Moreover, in order that ab is 
labeled inactive or dead, we must have \xy\ < \ab\ and that neither of the trian- 
gles xay and xhy is inadmissible. Therefore, in the convex quadrilateral axby in 
MWT(S. ay 7 ). we can replace the diagonal ah by xy to decrease the total length 
without destroying the angular condition. This leads to a hrm contradiction. □ 
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L e mma 2 * If ab ^ M WT ( S. a . 7 ) , th e n a b m t e rs e c ts s 0 m e a c Uv e e dg e . 

Proof. Let ab be some edge not in MWT(S.a.j). Since ab ^ MWT(S.a.j). 
ab mnst intersect some edge in MWT(S. Let xogo be sncli an edge that 

ab n ^’0^0 is closest to a and the triangle x^ayo G MTLT( 5 . a. 7). Since a and b 
lie on opposite sides of x^yQ. xoyo is not on the convex linll. Therefore, there is 
another triangle xozyo in MTLr( 5 . a.7) adjacent to x^ayo. For convenience, we 
rename a as vq. Let Cq be the cone bounded by the two rays originating from 
Co tlirongli xq and y^. respectively. By constrnction. 6 G Cq. Note that both 
triangles x^ay^ and x^zy^ are admissible. See Figure 1. Assume to the contrary 




that ab does not intersect any active edge. By Lemma 1 . xov^yQ and XQZQyQ 
are not labelled dead, even if x^y^ is labeled dead or inactive. This means that 
VQzDxoyo — 0 . Therefore. 2: ^ Cq and z ^ b. Suppose 2: lies on the left side of cq^. 
We rename xq as vy . as xy and y as yi. By constrnction. we discover a new edge 
xiyi in MWT{S. a. 7) that intersects ab and b lies inside of the cone Ci bonnded 
by the two rays from vi tlirongli xi and yi. Thus, by application of the previous 
argil 1X1 ent to xiviyi and xiyi. we obtain a new edge X 2 y 2 that intersects ab 
and the triangle X2X\y2 conserves satisfaction of the angular condition. InfLiiite 
iteration of this argnment gives an infinite sequence of edges Xiyi. i > 1 that 
intersect v^b. This contradicts to the finiteness of MTTT( 5 , a, 7).' □ 

4 The Subgraph of the Angular Balanced Triangulation 

Denote the angular balanced triangnlation of a general position point set S 
by ABT(S). Initially, we define the concept the local optimality of the angular 
balanced triangnlation. and subsequently describe the algorithm which can be 
used to determine some of the edges in angular balanced triangnlation. 

Let e be an arbitrary edge in some triangnlation but not on the convex hull. 
As discussed in Section 3 . there exist two triangles ti and ^2 such that 
and a corresponding pair of triangles t[ and ^2 such that t[ H ^2 — The edge 
e is defined to be locally angular balanced if U ^2 N not convex or if ti U ^2 
is convex and f(ti.t2) < A triangnlation is called a locally angular 
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balanced triaiig illation (LABT) if eacli of its edge is locally angular balanced. 
Obviously, the angular balanced triangnlation iiinst be locally angular balanced; 
otherwise the exchange of the diagonals in a convex qnadrilateral containing a 
non-balanced edge will reduce the sniii of the values. 

Natnrally. a concept similar to the LMT-skeleton for the minimnm weight 
triangnlation can be considered. Correspondingly, we name the subgraph which 
contains a set of edges that must be in every locally angular balanced triangn- 
lation the LABT-skeleton and denote it by LABT(S). 

Obvionsely the framework of Algorithm LMT(5. a. 7 ) works for for determi- 
nation of LABT(S) if a replacement of the locally minimal test by the locally 
angular balanced test and removal of the admissibility test are made. Since the 
locally angular balanced test rnns in 0(n) time, the computation of the LABT- 
skeleton needs the same time and space bounds as the that of LMT(5. ay 7 ). 



5 Computational Results 



Since for a nniformly distributed random point set there almost always exist 
a triangle with the smallest angle as well as the largest angle which slionld 
be included in every triangnlation. it has very little meaning to consider the 
minimnm weight triangnlation with angular constraints in the random data set. 
The data we used for computing LMT(5. a. 7 ) are designed as follows. First, 
grid points in a hxed-size square are generated and then each point inside of the 
square is pnrtabated within a circle of small radius. By changing the value of 
the radius, different types of data sets can be produced. Figures 2-6 show results 
of LMT-Skeletons LMT(5. a. 7 ) by testing a set of 64 points. In this example 
the angle a changes between 0 and the value of the minimnm angle ap of the 
D elan nay triangnlation. i.e.. 0 = aq < aq < a 3 < a 4 < a/>. We set 7 = tt and 
set the radius to be half of the grid size in onr test. We tested for point sets of 
size np to 100. The obsevation is that for this type of point sets the effectiveness 
of the algorithm is inflnenced heavily by the value of the radius. The smaller the 
radius is the larger LMT-Skeletons can be found. Small pnrtabation results in 
small number of connected components in the subgraph. 
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We use uniformly random distribute point sets for computing the LABT- 
skeleton. Point sets of size range from 50 to 500 are tested. The results in Ta- 
ble 1 are average of ten runs for each size. We show the number of connected 
conipoiieiits(^comp). the percentage of the id eii titled edges (% MWT) in angu- 
lar balanced traiiigulation. the numbers of active and dead edges (its ratio) and 
CPU time in the table. The programs are coded in C and the experiments are 
conducted on a Ultra SPARC (143Mliz.) workstation. From our result shows 
that the algorithm works quite well for the point set with small size. Note we 
could improve the result by repeating the algorithm until no more edges can be 
identified as the edge in angular balance triaiigulation. 



6 Conclusions 

We have shown that the ideritification of the appropriate definition of local op- 
timality leads to a simple unified method for the coiiiputation of the subgraphs 
of the optimum triaiigulations. It is worth noticing that this unified method also 
works for determiiiing the subgraph of other optimal triaiigulations with iiiiii- 
sum type quality nieasurement. For example, if the measure function / for each 



nodes 


^^tcomp . 


% MWT 


^ active/dead edges(X) 


CPU time (sec.) 


50 


1.0 


0.75 


182/1031(0.177) 


0.04 


100 


1.5 


0.69 


43o4501 (0.096) 


0.30 


200 


4.3 


0.62 


1021/18838(0.054) 


2.37 


300 


10.3 


0.59 


1633/43129(0.038) 


8.66 


400 


24.0 


0.55 


2380/77288(0.031) 


21.45 


500 


40.2 


0.52 


3135/121406(0.026) 


429.87 



Table 1. The computational results for LABT-skeleton. 
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triangle is defined as tlie ratio of tlie lengtli of tlie longest edge to tliat of tlie 
sliortest edge, tlie triangnlation witli tlie iiiiniiiiniii value is tlie one tliat balances 
tlie lengtlis of tlie edges. We liope tliat fnrtlier investigation on tlie iiiiniiiiniii 
wriglit triangnlation witli angular constraints provides ns insiglit into tlie design 
of algoritliiiis for solving tlie iiiiniiiiniii weiglit triangnlation. 
Acknowledgment: We tliank Mr. Manabn Sngai and Professor Kokiclii Siigi- 
liara and Professor Masao Iri [16] for sliaring ns tlieir programs for tlie iiiiple" 
iiientation of onr algoritliiiis. 
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Abstract. In this paper, we investigate the maximum weight triangula- 
tion of a polygon inscribed in a circle (simply inscribed polygon) . A com- 
plete characterization of maximum weight triangulation of such polygons 
has been obtained. As a consequence of this characterization, an O(n^) 
algorithm for finding the maximum weight triangulation of an inscribed 
n-gon is designed. In case of a regular polygon, the complexity of this 
algorithm can be reduced to 0(n). We also show that a tree admits a 
maximum weight drawing if its internal node connects at most 2 non- leaf 
nodes. The drawing can be done in 0(n) time. Furthermore, we prove 
a property of maximum planar graphs which do not admit a maximum 
weight drawing on any set of convex points. 



1 Introduction 

Triangulation of a set of points is a fundamental structure in computational 
geometry. Among different triangulations, the minimum weight triangulation ^ 
MinWl\ of a set of points in the plane attracts special attention [3,6,9,10]. The 
construction of the MinWT of a point set is still an outstanding open problem. 
When the given point set is the set of vertices of a convex polygon (so-called 
convex point set)^ then the corresponding MinWT can be found in O(n^) time 
by dynamic programming [5,8]. 

According to the authors’ best knowledge, there is not much research done on 
maximum weight triangulation ^ MaxWT. From the theoretical viewpoint, the 
maximum weight triangulation problem and the minimum weight triangulation 
problem are equally important, and one seems not to be easier than the other. 
The study of maximum weight triangulation will help us to understand the 
nature of optimal triangulations. 

An inscribed polygon is one all whose vertices lie on a circle. In this paper, 
we show that the maximum weight triangulation of an inscribed polygon can 
be found in quadratic time, and the graph extracted from its maximum weight 
triangulation by omitting the boundary edges must form a special tree. Further- 
more, if the polygon is regular^ i.e., all its edges are of the same length and all 
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Wen-Lian Hsu, Ming- Yang Kao (Eds.): COCOON’98, LNCS 1449, pp. 25-34, 1998. 
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its inner angles are equal, then its maximum weight triangulation can be found 
in linear time. 

Straight-line drawing is a field of growing interest [2]. A special type of 
straight-line drawings is minimum weight drawings. Let C be a class of graphs, 
and S' be a set of points in the plane. Let G = {V^E) be a graph of C such that 
V{G) = S, is a set of non-crossing straight-line segments connecting pairs of 
points of S, and the sum of the lengths of the edges in E is minimized over all 
straight-line graphs of class C on S\ G is called a minimum weight representative 
of C with respect to S. A graph GeG is said to admit a minimum weigh drawing 
if (S is a minimum weight representative of C with respect to some point set 
S. In particular, if G is the class of trees, a tree G admits a minimum weight 
drawing if there exists a set S of points in the plane such that G is isomorphic 
to an Euclidean minimum weight spanning tree of S . For example, tree T in the 
Figure la has a minimum weight drawing as T is isomorphic to an Euclidean 
minimum weight spanning tree as given in Figure lb. 



In the area of minimum weight tree drawing, it is proved that every tree with 
maximum node degree of at most five admits a minimum weight drawing, i.e., 
it can be drawn as a minimum weight spanning tree of some set of points. On 
the other hand, a tree with maximum node degree of more than six cannot be 
drawn as a minimum weight spanning tree [13]. Interestingly, deciding whether 
a tree with maximum degree of six has a minimum weight drawing is NP-Hard 



The problem of constructing a minimum weight drawing for a planar tri- 
angulation was first studied by Lenhart and Liotta [11]. They showed that the 
greedy triangulation of a regular polygon is of minimum weight. Furthermore, 
they investigated the drawing of maximum outerplanar graphs. A graph G is 
outerplanar if it has a planar embedding such that all its nodes lie on a single 
face; an outerplanar graph is maximum if all the other faces of G are bounded 
by exactly three edges. They devised a linear-time algorithm that takes a max- 
imum outerplanar graph G as input and constructs its straight-line drawing G^ 
as output such that G^ is a minimum weight triangulation of the set of points 
representing the nodes of G. 




(a) 



(b) 



Fig. 1. An illustration of minimum weight drawing. 



[4]. 
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In this paper, we say graph G is a maximum weight drawing MaxW D if G 
is isomorphic to a straight-line graph in the plane with maximum weight. In 
particular, we show that any tree whose internal nodes connect to at most 2 
non- leaf nodes can be realised as a maximum weight drawing in linear time. We 
further show that any graph with a special forbidden property does not admit 
a maximum weight drawing, i.e., there does not exist a convex point set whose 
maximum weight triangulation is isomorphic to this graph. 

In Section 2, we present some definitions on MaxWT. In Section 3, the 
MaxWT of inscribed and regular polygons will be discussed. The maximum 
weight drawing of a special type of tree on a convex point set will be described 
in Section 4. We shall discuss in Section 5 that some graphs with a certain 
property do not have a MaxW D. Section 6 is the conclusion. 



2 Preliminaries 

Let be a set of points in the plane. A triangulation of denoted by 
T{P)^ is a maximal set of non-crossing line segments with their endpoints in 
P. It follows that the interior of the convex hull of P is partitioned into non- 
overlapping triangles. The weight of a triangulation T[P) is given by 

PiPj^T(P) 



where is the Euclidean length of line segment pipj. 

A maximum weight triangulation of P {MaxWT {P)) is defined as for all 
possible T{P)^ iv{MaxWT{P)) = max{Lv{T {P))} , 





Fig. 2. An illustration of the definitions. 



Let be a convex polygon (whose vertices are a eonvex point set) and T{P) 
be its triangulation. A fly triangle of T{P) is one consists of three diagonals 
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(Figure 2a). An ear olT[F) is a triangle containing two consecutive boundary 
edges of which are called ear edges. An inner- spanning tree of the vertices of 
F IS 8i subgraph of T{F) whose nodes are those vertices of F and whose edges 
are the internal edges of T[F) plus two ear edges, one per ear (Figure 2b). 

For simplicity, in the proofs of the lemmas, we use etc. to denote 

the comparison of the lengths of arcs or line segments, xy^ denotes the square 
of the length of line segment i.e., ah < cd means Lv{ab) < Lv{cd). 



3 Maximum weight triangulation of inscribed polygons 

The following lemma shows an important property of the maximum weight tri- 
angulation of an inscribed polygon. 

Lemma 1. Let F he an inscribed polygon. Then MaxWT[F) cannot contain 
any fly triangle. 



c c 




(a) (b) 

Fig. 3. For the proof of Lemma 1 . 



Proof: By contradiction. With respect to Figure reff2, let Aabc be a fly triangle 
of MaxWT(F). Then, Aabc has three neighboring triangles, say Aae6, A6/c, 
and Acda. Let the intersection points of diagonals a/, bd^ and ce be A, 6^, and 
c^, respectively. There are two distinct cases, depending on whether center o of 
the circumcircle lies inside AAUd or not. 

(1) Let o lie outside the triangle Aa'flF (Figure 3a). Then, o must lie inside 
one of the areas bounded by (cA, c^a, adc) or (aA, 6A, aeb) or [bb\ cF ^ bfc^ where 
ahc denotes arc abc. W.l.o.g., let o lie inside the area bounded by cc^, A a, and adc. 
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Then, arcs fbea < tv and cfbe < tv. In quadrilateral Baebc of MaxWT{P)^ given 
W < ab^ then cfbe < bea. Similarly, in Uabfc of MaxWT[F)^ as af < bc^ then 
fbea < cfb. Thus, we have cfbeP fbea < bea-\- cfb^ or fbe < 0, a contradiction. 

(2) Let o lie inside Aa/PP (Figure 3b). In Dae6c of MaxWT{F)^ ce < ab^ 
then we have cfbe < bea. Similarly, we have adcf < c/6 as a/ < 6c, and we have 
bead < adc as db < aZ. Adding them accordingly, we have (6eadTa(ic/Tc/6e) < 
[adc T cfb T bea). Then, [ad + c/ T be) < 0, a contradiction. 

Both of the above cases assume center o lies inside fly triangle Aabc. If center 
o lies outside Aa6c, one of the angles of Aabc must be larger than By a lemma 
to be proved later, Aabc cannot be a fly triangle. □ 



Corollary 1. The MaxWT{F) of an inscribed polygon F contains an inner- 
spanning tree which is the maximum weight spanning tree of the vertices of F. 

Proof: As MaxWT[P) does not contain any fly triangle, except for the two ear 
triangles, each triangle of MaxWT[F) contains exactly one boundary edge of F. 
Moreover, one of the two boundary edges in an ear triangle must belong to any 
spanning tree. Thus, MaxWT[F) contains an inner- spanning tree of F. By the 
property of MaxWd\ the inner-spanning tree of MaxWT[F) is the maximum 
weight spanning tree of the vertices of F. U 

Theorem 1. The MaxWT{P) for an inscribed n-gon P can be found in 0(pF) 
time. 

Proof: Assume P = (0, 1, ..., n — 1) and all the vertex indices are modulo n. Let 
Wi^j with i < j denote the weight of MaxWT[Pij)^ where Pij is the convex 
subpolygon of F. By Lemma 1, MaxWT[F) does not contain any fly triangle. 
Thus, for each internal edge ij in Max IFT(P), the triangle in Pij associated with 
edge ij must involve with either boundary edge + 1) and diagonal (i + l)j or 
boundary edge [j — l)j and diagonal i[j — 1). 

Thus, we have the following recursion for 

Wij = uj(JT) if j = 

i T 1 , 

Wi^j = max{Wi^j-i + Lo{{j - + uj{i{i + 1))) + Lo{ij) if i + 1 < j. 

Since the recursion indices i and j range fronm 0 to n — 1 and each evalution 
of Wij takes constant time, all Wij for 0 < < n— 1 can be evaluated in 

O(n^) time. Finally, Lo[MaxWT[F)) = |0<i<n — 1} which 

takes another 0(n) time. □ 

When P is a regular polygon, the following theorem shows that MaxWT[P) 
is not unique. 

Theorem 2. Any inner- spanning tree of a regular n-gon F together with the 
boundary edges of P form a MaxWT{P). 
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Proof: Corollary 1 implies that the MaxWT[F) does not contain any cycle 
formed by diagonals. Moreover, as every boundary edge of F is shorter than 
any internal edge of MaxWT[F)^ all the internal edges of MaxWT[F) and two 
edges of F form a maximum weight spanning tree. We say in F that a diagonal 
bridging a piece of the boundary with k edges if they form a cycle of length 
For every inner- spanning tree of P, it must consist of two boundary edges and 
a set of diagonals in which a diagonal bridging two boundary edges, a diagonal 
bridging three boundary edges, ..., a diagonal bridging (n — 2) boundary edges. 
As F is regular, all diagonals bridging the same number of boundary edges must 
be of the same length. Thus, all the inner-spanning tree must be of the same 
weight. □ 

4 Maximum weight drawing of Caterpillar graphs 

Let C be a class of graphs and 5 be a set of points in the plane. Let G = (C, P) be 
a graph of C such that V (G) = S\ E is a set of straight-line segments connecting 
pairs of points of S', and the sum of the lengths of the edges in E is maximized 
over all graphs of class C on S. (T is called a maximum weight representative of 
C with respect to S. A graph G[eG) is said to admit a maximum weight drawing 
[M axW D) if (T is a maximum weight representative of C with respect to some 
point set S. 

A eaterpillar is a tree such that all internal nodes connect to at most 2 
non-leaf nodes. Figure 4 gives an example of caterpillar. 




Now let C be the class of caterpillars. We say a caterpillar Gc has a maximum 
weight drawing if there exists a convex point set S in the plane such that Gc is 
isomorphic to an Euclidean maximum weight spanning tree of S, 

In this section, we present a linear-time algorithm for the MaxWD of cater- 
pillars through the inner-spanning trees of the vertex set of a regular polygons. 
Given a caterpillar of n nodes, we construct a regular point set^ i.e., the vertex 
set of a regular n-gon, (0, 1, ...,n — 1). The drawing starts from a head of the 
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caterpillar Gc, i.e., an internal node with exactly one other internal node as 
its neighbor. For example, nodes a and k are heads in the caterpillar given in 
Figure 5a. The next step is to select a vertex, say (n — 1), in the regular n-gon 
to represent the head, and to act as the center of a fan to vertices 0,1,... to 
represent edges adjacent to the head (Figure 5b). The drawing of the spanning 
tree will continue with the head’s neighboring internal node to be represented 
by the last vertex in the fan (Figure 5b). The drawing will proceed along the 
chain of internal nodes of Gc and the detailed algorithm is given below. 

Algorithm MaxWDRAW 

Input: Caterpillar graph Gc 

Output: Maximum weight spanning tree isomorphic to Gc 

Method: 

1 n ^ I V {Gc) I ; Draw a regular point set (0, 1 , . . . , n — 1 ) . 

2 Let Vi be the chain of internal nodes starting from a head of Gc. 

— 1; t^O; Draw st 

3 While Vi do 

Vi ^ Extract{Vi)] k ^ degree{vi); 

Draw sj for j = t l,tT2,...,tTA: — 1; 
t ^ t k — 1] 
if Vi 0 then 

Vi ^ Extract{Vi)] k ^ degree{vi); 

Draw tj for j = s — l,s — 2,...,s — + 1; 

s ^ s — k -\- 1; 

EndDo 




Fig. 5. (a) caterpillar graph (b) the corresponding maximum weight drawing. 



Theorem 3. Any caterpillar graph has a straight-line maximum weight drawing 
and which can he found in linear time. 
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Proof: Apply algorithm MaxWDRAW to Gc- The output is a spanning tree over 
a regular point set, which is isomorphic to Gc- By Theorem 2, this spanning tree 
gives the maximum weight triangulation of the regular polygon formed by the 
point set. Corollary 1 implies that this spanning tree is of the maximum weight. 
Finally, it is easy to see that MaxWDRAW takes 0(n) time. □ 



5 Forbidden graphs for maximum weight drawing on a 
convex point set 



In this section, we shall prove that some maximum outerplanar graphs do not 
admit an MaxW these graphs are called forbidden graphs. 

Lemma 2. If F is a convex point set, then each interior angle of any fly triangle 
of the MaxWT{F) must be no less than j. 

Proof: By contradiction. W.o.l.g., assume Aabd is a fly triangle in the 
MaxWIfF) with Za < -| as the smallest angle and ah be the line segment 
perpendicular to bd from a (Figure 6). As Za = [a fl) < f , we have bd = 

ah^ {tana tan f3) < ah^tan{a fl) < oR. Replacing by ^ (as ^ > ah > bd) 

would arrive at another triangulation whose weight is larger than the weight of 
MaxWT{F). This leads to a contradiction. □ 

Corollary 2. If F is a convex point set, then no interior angle of any fly triangle 
of MaxWT{F) is larger than 

Lemma 3. If F is a convex point set, then there cannot exist two fly triangles 
sharing an edge in the MaxWT{F). 

Proof: By contradiction. W.o.l.g., assume the two fly triangles are Aabd and 
Abed as shown in Figure 6. We have 
ac^ = ab -\- be — 2ab ^ be^ eos{Zabe) 

= a(f‘ A d(? — 2ad ^ de^ eos{Zeda). 
b(f‘ = al? A ci(? — 2ab ^ ad ^ eos{Zdab) 

— b(? A d(? — 2be ^ de^ eos{Zbed). From Lemma 2, since all angles of the 
fly triangles are larger than j, Zabe and Zeda are larger than i.e., eos{Zabe) 

and eos{Zeda) are negative. Thus, we have 2a^ — 2bd > 0 or ax > bd. This 
contradicts that bd is an edge in MaxWT{F). □ 

Let G be the class of all maximum outerplanar graphs. A maximum outer- 
planar graph G has a maximum weight drawing if there exists a convex point 
set S in the plane such that G is isomorphic to an Euclidean maximum weight 
triangulation of S . 
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Fig. 6. Fly triangles in the MaxWT(F). 



Based on Lemma 3, the following theorem shows that some maximum outer- 
planar graphs do not have maximum weight drawings. Figure 6 illustrates such 
an example. 

Theorem 4. If G{V, E) is a maximum outerplanar graph containing a simple 
cycle C with four nonconsecutive nodes which form two triangles sharing a com- 
mon edge, then G cannot have a maximum weight drawing. 

Proof: Figure 6 shows a maximum outerplanar graph which does not have a 
maximum weight drawing, cycle G — ahbecfdga and the four nonconsecutive 
nodes a,b,c,d, as specified in the theorem. In fact, as long as nodes a,b,c,d are 
nonconsecutive, any ear edges in the triangulation (Figure 6) can be replaced 
by chains of nodes (note that many edges are needed to connect these nodes to 
make the graph maximum). The proof follows directly from Lemma 3 as any 
triangulation of a convex polygon isomorphic to a maximum outerplanar graph 
having the property specified in the Theorem would imply the existence of two 
fly triangles sharing a common edge. □ 

6 Concluding Remarks 

Maximum weight triangulation problem is the counterpart of and no easier than 
the minimum weight triangulation problem, in this paper, we study the maxi- 
mum weight triangulation of inscribed polygons, in particular regular polygons. 
Some properties of the triangulation have been obtained. We utilized this prop- 
erty to design a linear-time algorithm for maximum weight drawing of caterpillar 
graphs. We also showed the property of some forbidden maximum outerplanar 
graphs which do not admit maximum weight drawings. 
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Abstract. In this paper we present the new data structure Colored Sec- 
tor Search Tree (CSST) for solving the Nearest- Foreign- Neighbor Query 
Problem (^NFNQP): Given a set S of n colored points in IR^, where 
D > 2 is a constant, and a subset S^ G S stored in a CSST^ for any 
colored query point q G IR^ a nearest foreign neighbor in S\ i.e. a clos- 
est point with a different color, can be reported in 0(logn(loglogn) 
time w.r.t. a polyhedral distance function that is defined by a star-shaped 
polyhedron with 0(1) vertices; note that this includes the Minkowski 
metrics di and doo- It takes a preprocessing time of 0(n(log 
to construct the CSST. Points from S can be inserted into the set S' 
and removed from S' in 0(log n(log log time. The CSST uses 

0(n(log space. We present an application of the data structure 

in the parallel simulation of solute transport in aquifer systems by par- 
ticle tracking. Other applications may be found in GIS (geo information 
systems) and in CAD (computer aided design). To our knowledge the 
CSST is the first data structure to be reported for the NFNQP. 



1 Introduction 

1.1 Related work 

The Closest Pair Problem [CPP) is one of the fundamental problems in Com- 
putational Geometry [13]. Given a set S of points in IR^, D > 2 and constant, 
the CPP is to find a Glosest Pair of points in where the distances are mea- 
sured w.r.t. an LGmetric dt^ (1 < t < oo). The Closest Foreign Pair Problem 
[CFPP) is a generalization of the GPP, where the input A is a set of colored 
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points in IR^. The CFPP asks for a pair of points realizing the minimal bichro- 
matic distance in S', i.e. a pair of points with minimal foreign distance. In the 
All- Nearest- Foreign- Neighbors Problem [ANFNP) one has to compute for each 
point p of a fixed colored point set S a nearest foreign neighbor in S w.r.t. a 
given metric, e.g. an LSmetric 1 < t < cx) ([!]). 

Optimal 6>(n log n) algorithms for the CFPP and the ANFNP in two di- 
mensions are given in [1,6,7] . Efficient randomized algorithms for the CFPP 
in II > 2 dimensions are presented in [10]. Efficient randomized algorithms for 
On-line CFPP where the point set S is modified by insertions have been given 
in [9]. Recently, an optimal 6>(n log n) algorithm for the ANFNP in arbitrary 
dimensions for a fixed number of colors has been given in [8] . 

In this paper we study the problem of reporting nearest foreign neighbors 
inside a given set S of colored points in IR^ (or inside a subset C S) for any 
colored query point q G IR^ . 



1.2 Applications 

Our new data structure CSST meets the requirements of one of the paralleliza- 
tion strategies that are supported by our massively-parallel 3D particle tracking 
system that is used for the simulation of reactive solute transport in aquifer 
systems [3,5]: 

The strategy uses a distribution of the particles and the parameters given 
for the domain, e.g. the velocity field, once for all at the beginning of the sim- 
ulation; during the simulation the particles inside a processor have to query for 
the parameters at their specific location which involves network communication 
between the processors. Each processor stores its particles in a CSST. The par- 
ticles come out of different molecule classes which are represented by different 
colors in the terminology of our data structure. Eor the computation of sorption 
and chemical reactions we compute in each time step for each particle p the 
L'^-closest particle q with a molecule type different from p’s. In the query, only 
the unadsorbed particles are of relevance, which correspond to the points in the 
subset S\ 

To reduce the communication overhead, the processors use packers that col- 
lect queries to inject them into the network as a single package. Since in produc- 
tion runs of the simulation, the number of processors (about 50) is very small 
compared with the total number of particles (about 500 millions) used, the com- 
munication overhead is acceptable. 

Other applications of the CSST may be found in GIS (geo information sys- 
tems) ([13,14,4,11]) for D > 2 dimensions, and in 3D CAD (computer aided 
design) for specialized vertex selections. It is worthwhile to notice that from the 
i?(nlogn) lower bound for the ANFNP in the algebraic decision tree model of 
computation ([1]) we obtain an i7(logn) lower bound for the amortized com- 
plexity of a nearest-foreign-neighbor query. 
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1.3 Polyhedral distances 

Let S' be a set of n points in IR^, D > 2. A polyhedral distance function is 
defined by a polyhedron P given by its 0(1) vertices such that P is star-shaped 
w.r.t. the origin. For p G IR^ and 0 < ^ < cx> we denote by Ps{p) the open 
polyhedron which we obtain by scaling P w.r.t. the origin by a factor of 6 and 
translating the resulting polyhedron by the vector p. The (oriented) polyhedral 
distance from p = (p.l, . . . ,p.O) G IR^ to g = (g.l, . . . , q.D) G IR^ is defined by 




max{^| q ^ Psip)} if pd ^ qd for at least one i G {1, . . . , D} 
0 otherwise 



Well-known polyhedral distances are given by doo{p^q) '•= — 

qd\] defined hy P = {p E IR^jp.i G { — 1,1}'^^ = and di{p^q) := 

"l2i=i I) ~ ^'^1 defined by P = {p G IR^jp.i G {1,0} Vi = l,...,OA 
'^i=iPd = 1}, also known as Minkowski-metrics ([13])^. 



1.4 Nearest-foreign-neighbor query problem 

We assign each point p G S a color c(p) G IN. For c(p), p G S, we denote by S^(p) 
the subset of S containing all points with color c(p). The (oriented) polyhedral 
foreign distance df[p^ q) from p to g is defined by 

‘’’.C 

^ ^ { cx) otherwise 

The nearest- foreign- neighbor distance from p to S w.r.t. df is defined by 
6{p^S) := mm{df[p^ q)\q ^ *5'}. The nearest foreign neighbors of p in S' w.r.t. df 
are the points in S\ S^^^) lying on the boundary of Ps(p,s){p)^ We formulate the 
nearest-foreign-neighbor query problem [NFNQP) as follows: 

Definition 1. Given a set S of n colored points in IR^, a possibly empty subset 
C S, and a polyhedral foreign distance function df defined by a polyhedral P 
which is star- shaped w.r.t. the origin with 0(1) vertices. Preprocess S in such a 
way that 

a. points from S can be inserted into and removed from efficiently^ and 

b. for any colored query point q G IR^, a nearest foreign neighbor in the current 
set S' w.r.t. df can be reported efficiently. 



1.5 Contribution of this paper 

We introduce the new data structure Colored Sector Search Tree [CSST) for an- 
swering nearest-foreign-neighbor queries w.r.t. polyhedral foreign distance func- 
tions efficiently. The CSST is obtained by “cascading” complete binary skeleton 

^ For 1 < t < oc the Minkowski metric dt is defined by dt{p, q) := |pd — ^ 
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search trees [CBSST) [12] and colored quadrant priority skeleton search trees 
[CQPSST) ([2]). The main result of the paper is as follows: 

Given a set S' of n colored points in IR^, a subset S^ C S, and a polyhedral 
foreign distance function df defined by a polyhedron F with 0(1) vertices which 
is star-shaped w.r.t. the origin. A CSST storing S^ using 0(n(log n)^“^) space 
can be computed in 0(n(log time such that 

a. In 0(log n(log logn)^“^) time, a point s G S can be inserted into S^ and 
removed from Sh 

b. In 0(log n(log logn)^“^) time, for any colored query point q G IR^, a nearest 
foreign neighbor in S^ C S w.r.t. the polyhedral distance function df can be 
reported. 

The result differs from the i7(log n) lower bound by a factor of (log log 
For fixed = S and S as query point set the CSST improves the result in [8] to 
handle not only 0(1) but 0(n) colors in the A AFAP for the cost of an additional 
(loglogn)^“^ factor in the time complexities. To the best of our knowledge the 
CSST is the first data structure to be reported for the NFNQP, 

2 The colored sector search tree 

Let S' be a set of colored points in IR^ and Z\ be a (0+ l)-faced polyhedron with 
one vertex in the origin and supporting hyperplanes /ii, . . . , ho+i (the polyhe- 
dron P from section 1.3 can be divided into 0(1) such elementary polyhedrons 
A). W.l.o.g. we assume that the hyperplanes /ii, . . . , hjj contain the origin. Let 
hf denote the (closed) halfspace on that side of the hyperplane hi which contains 
A. By hi[p) (resp. hf[p)) we denote the hyperplane (resp. the halfspace) which 
we obtain by translating hi (resp. hf) by the vector p. By As{p) we denote the 
polyhedron which we obtain by scaling Z\ by a factor of 6 w.r.t. the origin and 
translating the resulting polyhedron by the vector p. Hence A^[p) denotes the 
entire sector (p). Given the polyhedron Z\, the foreign sector neigh- 

bors in S of a colored query point q G IR^ are those points in (S\Sc(g))nZ\oo(^), 
such that the foreign distance from q to these points is minimal among all points 
in (S'\S'c(^)) nZ\oo(g). 

The colored sector search tree [CSST) supports the following operations in 
0(log n(log log n)^“^) time each: 

a. insert{p): Insert the point p e S into the subset S^ C S stored in the CSST, 

b. delete[p): Delete the point p from the subset S^ C S stored in the CSST, 

c. nfsn[q): Report a foreign sector neighbor of q inside the subset S^ C S stored 
in the CSST, 

In the following we assume that the points in S are in general position, i.e. no 
two points of S he on a common hyperplane hi{r)^ i G {!,... ,12 + 1}, r G IR^. 
We will show in section 2.5 how to overcome this restriction. 
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2.1 Basic structures of the CSST 

A base tree for key values /Ci, . . . is a 0-2 binary tree ([12]), i.e. each inner 
node has exactly two sons, and a leaf search tree for the key values, i.e. for ev- 
ery key value /C^, i C {1, . . . ,n}, there exists one leaf in the tree. For a node k 
we denote by LST{k) and RST{k) the left and the right subtrees of the tree 
rooted by k^ and by ST{k) the subtree consisting of k^ LST{k) and RST{k). 
Each inner node A: of a base tree stores a split value which is the minimal key 
value stored in a leaf of RST[k). The depth r][k) of a node A: in a base tree, say 

is defined as the number of nodes on the path from the root of 7' to A:. The 
base tree is almost balanced in the sense that for any two leaves R and 1 2 we 
have I rj[li) — \< 1. In standard terms, the base tree is an almost balanced 

binary skeleton search tree. 

We assign each point p E S Si unique key value /C(p). From a base tree for the 
key values /C(p), p G S', we obtain a colored binary skeleton search tree [CBSST) 
by inserting into each node enough space to store a colored point in IR^ which is 
possibly the nil-pomt. In the CBSST we store points according to the following 
conditions: 

a. Each point p E S lies on the path from the root to the leaf with value /C(p). 

b. If a node stores a point then, its father also stores a point. 

Denote by the subset of S that is currently stored in the CBSST. A point 
p G S is inserted into S' as follows: We sift the point p down the tree along the 
path from the root to the leaf whose value is /C(p). We store the point p in the 
first node of this path in which no point is stored so far. Since all key values /C(p) 
are unique by our restriction, condition (a.) ensures that such a node exists. A 
point p E S' IS deleted from S' as follows: After a binary search of the node 
storing p we remove p from this node. We fill the gap by sifting successor points 
up the tree. It is easy to show that inserting a point p E S into S' and deleting 
a point p E S' from the set S' can be done in O(logn) time. 

From the properties of the CBSST we easily obtain the following Theorem: 

Theorem 1. Civen a colored query point q G IR^; let A:i, . . . ^k^ denote the 
nodes on the root-to-leaf path which we traverse in a binary search in the CBSST 
for the value lC[q). All points in S' with key value smaller that lC{q) are either 
stored in a node ki or in a subtree LST{kj) for i G {1, . . . , m}. All points in S' 
with key value equal to or greater than lC{q) are either stored in a node ki or in 
a subtree RST^kR for i G 

Assume that we are given for the points p G S' a primary key /Ci(p) and 
a secondary key /C 2 (t)- As described in [2] we may base the colored quadrant 
priority search tree [CQPSST) on a CBSST for the key values /Ci(p), p E S. 
Additionally each node k of the CQPSST stores the maximal /C 2 -value of the 
points stored in ST[k) with a color different from the color of the point stored in 
k. For storing points in the CQPSST we have the following additional condition: 
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c. The /C 2 -values of the points stored along an arbitrary root-to-leaf path are 

in increasing order. 

Again denote by S' the subset of S that is currently stored in the CQPSST, 
Inserting a point p G S into S' and deleting a point p E S' from the set S' ^ can 
be done in O(logn) time ([2]). From [2] we obtain the following Theorem: 

Theorem 2. For any colored query point q G IR^; a point q' E S' D [S \ 
for which JCi[q) < Fi[q')^ ^ 2 {q) S lC 2 {q') and lC 2 {q') is minimal among all 
points in S' D [S \ S^^q)) that satisfy the first two conditions^ can he reported hy 
the CQPSST in O(logn) time; if such a point does not exist the query returns 
the nil-point. 

It is easy to show that both data structures CBS ST and CQPSST use 0(n) 
space to store a set S of n colored points. 

We obtain the CSST by “cascading” CBSSTs and CQPSSTs. The reason 
why the trees are based on the base tree and not on an arbitrary binary search 
tree is to avoid rebalancing in the CSST. 



2.2 The cascading mechanism 

Denote byi;^, i G {1,...,DT 1}, the vector perpendicular on hi starting in the 
origin. Let Vi be oriented inside for i G D} and up+i be oriented 

outside For i G {l,...,DTl}we define the key- value functions JCi : 

IR^^IR, iG {1,...,D+1}, by 

JCi{q) := t such that q G hi(tVi) 

which are injective on S by our assumption of general position. In level Li 
of the CSST we store the values /Ci(p), p G S', in a CBSST. In level Li^ i £ 
{2 . . . , D — 1}, of the CSST we additionally store for each inner node A: in a tree 
of level Li-i the values in {A^(p)|p G ST[k)} in a separate CBSST. In level Ljj 
of the CSST we generate for each inner node A: in a tree of level Lo-i a separate 
CQPSST with primary key values {/Cd(p)| p G ST(A:)} and secondary key values 
{JCjj^i[p)\p G ST(A:)} for the points stored in ST (A:). Again we denote by S' 
the subset of S that is currently stored in the CSST. 

Theorem 3. A CSST for a set S of n colored points in IR^ can be stored using 
0(n(log space. 

Proof. By construction, level Li requires 0(n) space. Level L 2 contains at most 
2^ trees of size bounded by ^ — 1 f^>r each j G {0, . . . , [logn]}. The levels 

T 2 , . . . ^Ld can be interpreted as a collection of CSSTs with D — 1 levels for 
the trees in level L 2 (see above). It can be seen easily that we may apply the 
Theorem recursively, and we obtain that the space needed to store all D levels 
of the CSST is bounded by 



The Colored Sector Search Tree 



41 



[log n~] 

E 

3=0 



2 [log n] 






/ 2 [log n] 
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/ j=o 

= 0{n{logn)^~^) 



which proves the Theorem. 



□ 



2.3 The operations insert and delete 

Inserting (resp. deleting) a point p E S into (resp. from) the set S' C S currently 
stored in the CSSTis performed in D steps where p gets inserted (resp. removed) 
into (resp. from) some trees of level Li in the ith step. 

In the first step we insert (resp. remove) p into (resp. from) the CBS ST of level 
Li. Denote by Ni the set of nodes thus considered. In the second step we insert 
(resp. remove) p into (resp. from) those CBSSTs of level L 2 which are rooted 
by nodes in N±. We denote by N 2 the set of nodes in the trees of level L 2 thus 
traversed. Generally, in the ith step, 2 < i < D, we insert (resp. remove) p into 
(resp. from) those CBSSTs of level Li which are rooted by a node in Ni-i] by 
Ni we denote the set of nodes on level Li thus traversed. In the Dth step we 
insert (resp. remove) p into (resp. from) those CQPSSTs of level Ljj which are 
rooted by a node in Njj-i (see [2]). 

Theorem 4. A point p E S can he inserted into and removed from the set 
S' C S currently stored in a CSST in 0(log n(log logn)^“^) time. 

Proof, By construction we have Ni = O(logn). Level L 2 contains at most 2^ 

trees of size bounded by ^ — 1 for j c {0, ... , [log n] }. The levels L 2 , • • • , To 
can be interpreted as a collection of CSSTs with D — 1 levels for the trees in 
level L 2 (see section 2.2). For a CSST in level L 2 with rn nodes inserting and 
deleting can be done in 0(log m(log log m)^“^) time by an inductive argument. 
Hence the total time for inserting p into the CSSTs on level L 2 corresponding 
to the nodes in Ni is bounded by 
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which completes the proof. 



□ 



2.4 The operation nearest foreign sector neighbor (nfsn) 

The operation nfns reports for a query point q a foreign sector neighbor of q 
inside the subset S' C S stored in the CSST. Let q G IR^ be a colored query 
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point. A nearest foreign neighbor of q in the subset S' ^ C S' of points currently 
stored in the GSST is computed in D steps as follows: 

In the first step we perform a binary search for the value JCi{q) in the CB- 
SST of level Li. Denote by Ni the set of nodes contained in the root-to-leaf 
path thus obtained. Obviously \Ni \ = O(logn). In the second step we perform a 
binary search for the value lC2{q) in those CBSSTs of level L2 which are rooted 
by a node in Ni. We denote by N 2 the set of nodes in the trees of level L 2 thus 
traversed. Generally, in the it h step, 2 < i < D, we perform a binary search for 
the value JCi{q) in those CBSSTs of level Li which are rooted by a node in Ni-i] 
by Ni we denote the set of nodes on level Li thus traversed. In the Dth step 
we perform a range query as stated in Theorem 2 for the CQPSSTs of level Ljj 
which are rooted by a node in TVd-i. 

Note that Theorem 1 ensures in the ith step that all points of S' ^)=i (^) 

are either stored in nodes of or in the right subtrees of these nodes. 

Theorem 5. For any colored query point q G IR^ a nearest foreign sector neigh- 
bor in the set S' C S currently stored in the CSST w.r.t. a polyhedral foreign 
distance function can be reported in 0(log n(log log time. 

Proof, A computation similar to the computation in the proof of Theorem 4 
shows that |A^| = 0(log n(log log and that performing the D steps 

given above requires 0(log n(log log time. Hence, computing a foreign 
sector neighbor of q among the points in S' can be done in 0(log n(log log 
time. □ 

2.5 Overcoming the restriction of general position 

So far we assumed that \hi[r) D S\ < 1, for alii G and for 

all r G IR^, i.e. no two points of S lie on a common hyperplane hi{r). This 
restriction can be removed without any additional overhead in the run time by 
assuming the lexicographic order among the points in IR^ in addition to the 
linear order existing among their key values while performing the operations 
insert, delete and query for nearest foreign neighbors in the CSST. Since D is 
assumed to be a constant the upper time bounds for insertion, deletion and 
nearest-foreign-neighbor query do not change. 

3 Nearest-foreign-neighbor queries 

Again, denote by S' a set of colored points in IR^ and S' C S he Si subset currently 
stored in a CSST. Denote hj df Si polyhedral foreign distance function that is 
generated by a polyhedron P which is starshaped w.r.t. the origin given by its 
0(1) vertices. Applying the results of section 2 to the NFNQPis straightforward 
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Theorem 6. For any colored query point q G a nearest foreign neigh- 

bor in the set w.r.t. the polyhedral distance function df can he reported in 
0(log n(log log time. 

Proof, It is easy to show that the surface of the starshaped polyhedral can 
be triangulated in 0(1) time. Therefore F can be subdivided into a constant 
number of O + 1-faced subpolyhedrals . . . , such that each polyhedral 
PP j G {1, . . . has the origin as one vertex. The supporting hyperplanes 
of the polyhedral P^ are denoted by j G {1, . . . , m}, i G {1, . . . , P T 1}, 
analogously to section 2. By Vjg^ j G {1, . . . , m}, i G {1, . . . , P T 1}, we denote 
the vector perpendicular on hjg starting in the origin. Assume that Vjg is oriented 
inside hF and that is oriented outside hF, We generate a CSST for each 

polyhedral Pi with key value functions ICjg : IR^ — IR by 

JCjg[q) = t such that q G hjg{tvjg) 

By Theorem 5 foreign sector neighbors can be reported in the sectors P^(g), 
j G {l,...,m}, in total 0(log n(log log time since m is treated as a 
constant. The foreign neighbor of g is the closest among the 0(m) foreign sector 
neighbors thus found and that is reported. This completes the proof. □ 

4 Conclusions 

We have presented the colored sector search tree [CSST). After a preprocessing 
time of 0(n(log n)^“^) the CSST reports for any colored query point q G IR^ 
a nearest foreign neighbor in a subset C A of a fixed set S C IR^ con- 
taining n colored points in 0(log n(log logn)^“^) time using 0(n(log 
space. Points from S can be inserted into the set S' and removed from S' 
in 0(log n(log logn)^“^) time. Distances are measured w.r.t. a polyhedral for- 
eign distance function defined by a star-shaped polyhedron with 0(1) vertices, 
e.g. one of the Minkowski distance functions d± and doo- We have presented an 
application of the CSST for parallelly simulating reactive solute transport in 
aquifer systems by particle tracking. 
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Abstract. Given an n- vertex polygonal curve F = [pi, p 2 , • • pn] in 
the 2-dimensional space B? ^ we consider the problem of approximating 
F by finding another polygonal curve F' = [pi, P 2 ? . . p^rn\ of rn vertices 
in F? such that the vertex sequence of F' is an ordered subsequence of 
the vertices along F. The goal is to either minimize the size m of F' for a 
given error tolerance e (called the min-# problem)^ or minimize the devi- 
ation error e between F and F' for a given size m of F^ (called the min-e 
problem). We present useful techniques and develop a number of efficient 
algorithms for solving the 2-D min-T^f^ and min-e problems under two 
commonly-used error criteria for curve approximations. Our algorithms 
improve substantially the space bounds of the previously best known 
results on the same problems while maintain the same time bounds as 
those of the best known algorithms. 



1 Introduction 

In this paper, we consider the problem of approximating an arbitrary n- vertex 
polygonal curve in the 2 -dimensional space by another polygonal curve F^ 
whose vertices form an ordered subset of the vertices along the original curve F. 
An n- vertex polygonal curve in is specified by an ordered set [pi , p2 , . . . , Pn] of 

vertex points in R^ such that any two consecutive vertices pi^ p^+i are connected 
by the line segment p^p^+i, 1 < i < n. It is possible that such a polygonal curve 
has self- intersect ions. Specifically, the problem is, given a polygonal curve F = 
[Ph P2j - '7 Pn] in R^ ^ to determine another polygonal curve = [pij P27 - —7 
p^^] of m vertices in R^ such that: 

1 . m < n (desirably, m is much smaller than n), 

2 . the vertex sequence of F^ is a subsequence of the vertex sequence of F^ with 
pi = Pi a nd = p^, and 

3 . each edge p-p-+i of F^ is an approximating line segment of the subcurve [pj, 
Pj+i, . . ., pk] of F^ where p[ = pj^ Pi^^i = Pk^ and j < k. That is, for every 
point p of the subcurve [pj, Pj+i, . . Pk] of F^ the error incurred by using 
PiPiJf-i (= PjPk ) fo approximate p, based on a given error criterion, is no 
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bigger than a specified error tolerance e. Such a line segment p[p[_^i is called 
the approximating line segment of the corresponding subcurve [pj^ Pj+i^ • • 
Pk\ of F, 

The parameter rn specifies the size (i.e., the number of vertices) of the “com- 
pressed” version P' of P, and the parameter e controls the “closeness” of to 
P (under a certain error criterion). Actually, there is a trade-off between the two 
parameters rn and e: The smaller e is, the larger rn tends to be, and viee versa. 
Based on this relation between rn and e, Imai and Iri [8,9] considered two ver- 
sions of optimization problems on approximating polygonal curves in the plane: 
(i) Given e, minimize rn (called the min-# problem)^ and (ii) given m, minimize 
e (called the min-t problem). In this paper, we study both the min- 7 )^ and min-e 
problems in 2-D space. 

Curve approximation problems appear in many applications, such as image 
processing, computer graphics, cartography, and data compression. In these ap- 
plications, it is often desirable to approximate a complex graphical or geometric 
object (possibly specified by a polygonal curve in R‘^) by a simpler object that 
captures the essence of the complex one yet achieves a certain degree of data 
compression [ 11 ]. 

An error criterion defines the goodness of fit in terms of the deviations be- 
tween the approximated and approximating objects. Different error criteria have 
been used in solving various polygonal curve approximation problems (e.g., see 
[2,4,5,6,8,9,10,12,13,14]). In this paper, we will use two commonly-used error cri- 
teria for studying polygonal curve approximations: The error criterion used in 
[2,8,9,10], which we call the tolerance zone criterion^ and the criterion used in 
[4,9,12], which we call the infinite beam criterion (the infinite beam criterion is 
also called the parallel- strip criterion in [4,9,12]). 

Under the tolerance zone criterion^ the approximation error between a seg- 
ment pjpk and the corresponding subcurve S = [p^, Pj+i, . . Pk] of P is defined 
as the maximum distance in an Lh metric between pjpk and each point on the 
subcurve S (we consider h G {l,2,cx)}). Because P is a polygonal curve, the 
maximum Lh distance between pjpk and the points of S can be computed by 
simply finding the maximum Lh distance between pjpk and each vertex pi of 
S (with j < I < k). We denote by disthifpJPkPPi) the Lh distance between the 
segment pjpk and the vertex pi. Under the infinite beam criterion^ the approxi- 
mation error between a segment pjpk and the corresponding subcurve S = [pj , 
Pj+i, . . ., Pk] of P is defined as the maximum Lh distance between the line 
L{pjPk) that contains pjpk and each point of the sub curve S . We denote by 
disth{L[pjPk)ppi) the Lh distance between the line L{pjpk) and a vertex pi of 
S. According to these error criteria, the approximation error incurred by using 
a curve P^ = [Pi , P 2 ? - - - ? Pm] approximate P is defined as the maximum error 
among those of the edges of P^ with respect to their corresponding subcurves of 
F (e.g., max"k^^{max{disth{p[phi:Pi) I Pi = Pj, P'i+i = Pk, and j <l< k}}). 
The parameter e specifies the upper bound of the approximation error of P^ with 
respect to P. 

Using the tolerance zone criterion, Imai and Iri [8,9] and Melkman and 
O’Rourke [10] studied the 2-D min- 7 )^ and min-e problems; their algorithms for 
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the 2-D min-^^^ (resp., min-e) problem take O(n^logn) (resp., 0[n^ log^ n)) time 
and O(n^) space. Recently, Chan and Chin [2] reduced the time complexity of the 

2- D min-^^^ (resp., min-e) problem under the tolerance zone criterion to O(n^) 
(resp., O(n^logn)). Based on the infinite beam criterion, Toussaint [12] solved 
the 2-D min-^)^ problem in O(n^logn) time and O(n^) space, and Imai and Iri [9] 
gave an 0{n^ log^ n) time and O(n^) space algorithm for the 2-D min-e problem. 
Eu and Toussaint [4] then published another algorithm for the 2-D min- 7)^ (resp., 
min-e) problem under the infinite beam criterion that was claimed to take 0{p?) 
(resp., O(n^logn)) time and O(n^) space. They also used the infinite beam cri- 
terion based on the L\ and Loo metrics. Very recently, Barequet et al presented 
efficient algorithms for approximating polygonal curves in 3-D and higher dimen- 
sional spaces under the tolerance zone criterion [1]. Varadarajan [13] studied the 
min-^ and min-y problems for 2-D monotone polygonal paths, using the uniform 
measure of error. He gave 0(n"^/^+^) time and space algorithms for both prob- 
lems, where 4 > 0 is an arbitrarily small constant. However, those algorithms 
cannot be extended for the tolerance zone and infinite beam criteria. 

In this paper, we present a number of efficient algorithms for the 2-D min- 7)^ 
and min-e problems [4,2]. In particular, we solve the min- 7)^ and min-e prob- 
lems under both the tolerance zone and infinite beam criteria in the same time 
bounds as [4,2], but using only 0(n) space, in comparison with the O(n^) space 
used in [4,2]. Our algorithms are based on several new ideas and techniques. 
In [4,2,8,9,10,12], the min-7)^ problem is solved by first constructing a directed 
acyclic graph G = (E, L) for the curve approximation (where V is the vertex set 
of the curve E), and then finding a pi-to-p^^ shortest path in G, G has an arc 
Gj = {PiPPj)G G j ^ if and only if (iff) pipj is the approximating line segment for 
the chain [pi^ Pi+i^ - - - ? Pj] of P- For the 2-D case, the number of edges in O', |L|, 
is O(n^), and the time complexity of the min- 7)^ algorithms in [4,2,8,9,10,12] is 
in general dominated by the time for constructing G. For the min-e problem, the 
algorithms in [4,2,8,9,10,12] first compute and store the O(n^) approximation 
errors for all the segments pip] defined on P, and then perform a binary search 
on these errors for the sought error e, at each step of the search applying a min- 7)^ 
algorithm. In comparison, we are able to compute a p\-to-pn shortest path in G 
without having to maintain G explicitly for the min- 7)^ problems, and to store 
only a fraction of the O(n^) approximation errors for the binary search process 
for the min-e problems. Thus, we reduce the space bound of the 2-D min- 7)^ and 
min-e problems to 0(n). Our results can be further generalized to several other 
cases of the 2-D polygonal curve approximation problems, as well as to some 

3- D cases of those problems (our 3-D results will be given in the full paper). 

2 Useful Structures 

Let F = [piy P 2 y • • - 7 Pn] be an arbitrary polygonal curve in 2-D that is to be 
approximated using an Lh metric, where h e {l,2,cx)}. Due to the similarity of 
our algorithms for these three metrics, we will first illustrate the main ideas and 
steps of our algorithms with the L2 metric; the differences between the algorithms 
for the L2 metric and those for the L± and Loo metrics will be discussed later. 
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Henceforth, unless otherwise specified, the metric we use is L 2 , and the subscript 

2 is omitted in all notations related to L 2 distances (e.g., dist^pipj^pk) is used 
instead of dist2 {PiPj^Pk))- 

Based on the tolerance zone criterion, a vertex pk of F is within distance e 
from a line segment piPj , where i < k < j ^ \i the following conditions are all 
satisfied [2]: 

Condition 1: dist{L{pipj)^pk) < e. 

Condition 2: If the convex angle defined by pipk and pip] is greater than tt/2, 
then d[pkppi) < c, where d[pkppj) denotes the L 2 distance between pk and 
Pi , and an angle defined by two line segments is said to be convex if the angle 
is no larger than tt. 

Condition 3: If the convex angle defined by pjpk and pjpi is greater than tt/2, 
then d[pkppj) < e. 

These three conditions together define a region called the error tolerance 
region of the line segment pip]. Let ray[pippj) denote the ray emanating from pi 
and passing through pj. Since the line segment pip] = Tay[pippj) fl Tay[pjppi)^ 
the error tolerance region oipip] is the intersection of the error tolerance regions 
of ray{pippj) and ray{pjppi). 

In comparison, for the infinite beam criterion, only Condition 1 needs to be 
satisfied, and hence the shape of the error tolerance region of a segment pip] is 
an infinite “strip” of width 2e in 2-D, 

Let pi^ pj^ and pk be vertices of P with i < k < j. If dist[ray{pi^pj)^pk) < c, 
then ray{pi/pj) is said to be an approximating ray of pk> If dist{ray{pi/pj)/pk) 
< t for each k with i < k < j ^ then ray[pippj) is an approximating ray of the 
chain [p^, p^+i, . . Pj] of P (an approximating ray, for short). Thus, under 
the tolerance zone criterion, dist{]pip]ppk) < e iff dist[ray]pippj)ppk) < e and 
dist[ray[pjppi)ppk) < e [2]. In other words, pip] is an approximating line segment 
of pk iff Tay[pippj) and ray[pjppi) are both approximating rays of pk- Therefore, 
for a given e, one can first compute all approximating rays Tay]pippj)^ then all 
approximating rays Tay[pjppi)^ with 1 < i < j < n, and finally find the set of 
approximating line segments from the set of approximating rays [2]. 

Consider the tolerance zone criterion. In 2-D, for two vertices pi and p/^, let 
ra and r^ be two rays emanating from pi such that the distance between pk and 
each of and ry is exactly e. Let Dik be the whole plane if d[pi^pk) < c, and let 
Dik be the convex region bounded by and ry otherwise. Then, by Conditions 
1-3, dist{ray{pi,pj),pk) < e iff pj e Dik [2]. 

3 and Min-e Problems with the Metric 

The algorithms in [2,4] for solving the 2-D versions of the min-^)^ and min-e 
problems all use 0{p?) space. In this section, we present several algorithms for 
the 2-D min-^ and min-e problems. The time bounds of our algorithms match 
those of their corresponding solutions in [2,4], but our space bounds are 0(n) 
instead of 0{p?). We will mainly describe the algorithms under the tolerance 
zone criterion. The different aspects of the algorithms under the infinite beam 
criterion will be pointed out at appropriate places. 
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3.1 2-D min-# algorithms 

The basic approach for solving the 2-D min-# problem on an arbitrary polygonal 
curve = [pi , P2 7 • • • 7 Pn] is as discussed in Section 1 : Construct a directed acyclic 
graph (T = (P, E) for the curve F (where V = {pi,P2, • • • , Pn} and every edge 
Cij of E^l<i<j< n, represents an approximating segment pipj of F)^ and 
find a shortest pi-to-pn path in G. Note that most of the previous 2-D min-# 
algorithms [4,2,8,9,10,12] separate the stage of constructing the graph G from the 
stage of computing a shortest path in (T, thus having to store G explicitly, which 
requires O(n^) space. Our idea for solving the 2-D min-# problem is to mix these 
two stages together. In particular, we present a technique for finding a shortest 
path in G incrementally. This technique computes the edges of G only as they 
are needed by the shortest path computation. Hence it avoids maintaining G 
explicitly, while is still able to find a shortest path in (T in a certain topological- 
sort fashion. As it turns out, only 0(n) space is needed by our 2-D min-# 
algorithm. 

For 1 < i < j < n, let SD[pi^pj) denote the length of a shortest path from pi 
to pj in (T, and let denote the weight of an edge Cij G E. For this problem, 

w{eij) = 1 for every Cij G E (but our algorithm still works even if w{eij) is of any 
fixed value). Since the graph G is directed acyclic, it is clear that the inductive 
relation SD[pi^pj) = mm{SD[pi^pk) \ fGk<jGn and Ckj G E}^ 

where SD[pi^pi) = 0, holds. This immediately suggests an incremental algorithm 
for computing SD[pi^pj) (from the ND(pi,p/^)’s, with 1 < k < j) . As m the rest 
of this section, we WLOG describe only our algorithm for computing the length 
SD[pi^Pn) of a shortest pi-to-pn path in G (the algorithm can be easily modified 
to produce, in addition to ND(pi,p^), a shortest path tree of G rooted at pi). 

Let Dik be defined as in Section 2, for 1 < i < k < n. That is, Dik is either 
the whole plane if d[pi^pk) < e or is otherwise the planar cone (or wedge) that is 
bounded by the two rays emanating from pi and tangent to the disc of radius e 
centered at pk. For 1 < i < j < n, let F# = and let Bij = 

Observe that, if P\j (resp., Bij) is not empty, then every ray r G F# (resp., Bij) 
that starts from (resp., pj) has nonempty intersection with the disc of radius 
e centered at pk^ for each i < k < j . Thus, under the tolerance zone criterion, 
PiPj is an approximating line segment of F (and hence Cij G E) iff pj G Eij and 
Pi G Bij. Note that F# (resp., Bij) always consists of one (possibly empty) cone. 
Hence each F# (resp., Bij) takes 0(1) space to store. Also note that if PipJ is 
an approximating line segment of F^, then pipj G Eij Pi Bij . 

Our incremental algorithm, based on the above inductive relation among the 
ND(pi,p/^)’s, is described as follows. We start with the basis SD{pi^p±) = 0. Now 
suppose that for a vertex pj such that 2 < j < n, we have obtained SD[pi^pk) 
for every k with f < k < j. We must show how to obtain SD[pi^pj). Clearly, 
the key to computing SD[pi^pj) from the SD{pi^pkYs is to identify all the edges 
ekj G E such that 1 < k < j. To find out whether an edge Ckj G we need to 
compute Ekj and Bkj and to test whether both pj G Ekj and pk G Bkj . Testing 
whether pj G Ekj and pk G Bkj can be easily done in 0(1) time for each k 
if P\j and Bkj are already available, so we focus on computing the F#’s and 
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Assume that we have computed and maintained Bkj-i for every k with 
1 ^ ^ ^ j ~ 1 (WLOG, let Fk^k be the whole plane). Then by definition, Fkj 
= Cl Dkj. So computing an Fkj from an already available Fkj-i takes 

0(1) time. In contrast, we need not maintain the Bkj^s. Simply, Bkj^s can be 
computed by definition: Bkj = Bk-\-ij C Djk- Hence each Bkj can be obtained 
from Bk+ij in 0 ( 1 ) time. 

It is now clear that, once the *SD(pi,p/^)’s and Fkj-i^ are available for every 
k with 1 < A: < A the edges Ckj C F can all be identified in 0[j) time. Thus 
SD[pi^Pj) can also be computed in 0[j) time. Maintaining the *SD(pi,p/^)’s 
and Fkj-i^s uses 0(n) space. Therefore, our 2-D min- 7 )^ algorithm under the 
tolerance zone criterion takes altogether O(n^) time and 0(n) space. 

Lemma 1. The 2-D min-# problem on an arbitrary polygonal eurve under the 
toleranee zone eriterion ean be solved in O(n^) time and 0{n) spaee. 

Our 2-D min-^^A algorithm under the infinite beam criterion, although is like- 
wise based on the inductive relation among the *SD(pi,p/^)’s, is somewhat differ- 
ent from the algorithm under the tolerance zone criterion presented above. Due 
to the space limit, we leave the description of this algorithm to the full paper. 

3.2 2-D min-e algorithms 

The basic approach for solving the 2-D min-e problems is as follows: Perform 
binary search on a sorted set of the O(n^) approximation errors of the polygonal 
curve F^ at each step of the search using the corresponding min- 7 )^ algorithm 
on F and a certain approximation error eh Since our 2-D min- 7 )^ algorithms in 
Subsection 3.1 all use 0(n) space, the main difficulty to improving the O(n^) 
space bound of the 2-D min-e algorithms in [2,4] is now at the binary search 
process of this approach. 

The previous 2-D min-e algorithms [4,2,8,9,10,12] explicitly store the O(n^) 
approximation errors of P in a sorted array for the binary search process. Our 2- 
D min-e results are based on the observation that binary search on a set A can be 
performed in multiple stages: In the first stage, perform binary search on a sorted 
subset of the set A (and hence only this subset needs to be stored), and then 
recursively perform the remaining stages on an appropriate subset of A. Our 
technique actually stores only 0 (n) judiciously chosen sample approximation 
errors of F out of the total O(n^), while still enables us to perform binary search 
on all the O(n^) errors of P, without increasing the time bound of the binary 
search process. The space bound for all our 2-D min-e algorithms is hence 0(n). 
This technique could be applicable to other problems involving similar binary 
search processes. 

Our technique consists of 0(1) stages, each of which performs the following 
computation. (1) Select 0(n) samples from the set S of all the elements that are 
currently active for the binary search; these samples are such that between any 
two consecutive samples, there is a provably “small” subset of S. (2) Perform 
binary search on the 0(n) samples. (3) At the end of this binary search, reduce 
the problem to one such “small” subset of the set S (thus only the elements in 
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this subset continue to be currently active). The details of this technique are 
unfolded as follows. 

First, we need to organize the O(n^) approximation errors into n sets, each of 
which is of size 0(n). Let err^pipj) = {dist^pipj^pk)} (resp., err{L[pipj)) 

= max^^^ {dist[L {piPj ):Pk)}) denote the approximation error of the segment 
(resp., line) p~pj (resp., L{p~pj)). 

Since err{p~pj) = m8ix{err{L{p~p])), m8ix{^.{d{pi,pk), d{pj,pk)}}, for all the 
segments pipJ such that I < i < j < n, we have {err^pipj) \ l<i<j<n}C 
{err{L{p^)) | 1 < i < j < n} U {d{pi^pj) \ I < i < j < n}. Let EHR{F) = 
{err[piPj) \ I < i < j < n}, L-ERti{F) = {err{L{pipj)) | 1 < i < j < n}, 
and V-EBR{F) = {d{pi^pj) \ 1 < i < j < n}. Since L-EBR{F) U V-EBR{F) is a 
superset of ERR[F)^ the sought error e G ERR[F) that is the solution for the 
2-D min-e problem is contained in LE!FR{F) U V-EFR{F) . Hence we search for 
the e G EFR{F) by performing binary search on L-EFR{F) U V-ERti{F). Note 
that \L-ERR{F) U V-EBR{F)\ = O(n^). 

For every i with 1 < i < n, let L-ERRi be the set of the approximation errors 
err[L{piPj)) for the lines L{pipj)^ and VEIFRi be the set of the d(p^,pj)’s, for all 
j such that i < j <n (assume that err[L{piPj)) = 0). Note that all the errors in 
LERRi can be computed in O(nlogn) time and 0(n) space by using Toussaint’s 
approach based on maintaining on-line convex hulls of planar points [12] (in 
fact, L-ERRi can also be obtained in the same complexity bounds by using a 
tree-guided scheme). Also, it is easy to compute V-ERRi in 0(n) time and space. 
Let ERRi = L-ERRi U V-ERRi. Then \ERRi\ < 2n and ERRi can be obtained in 
O(nlogn) time and 0(n) space. Since L-ERR{F) U V-ERR{F) = U'^E-^ERRi^ we 
organize the O(n^) approximation errors of L-ERR{F) U V-ERR{F) into the sets 
ERRi^ ERR 2 j . . . , ERRj^_i. WLOG, we assume that the errors in L-ERR[F) U 
V-ERR[F) are distinct (ties can be easily broken in a systematic way). 

The following lemma, which has been used in various selection algorithms 
before, is a key to our 0(n) space binary search algorithm. 

Lemma 2. Suppose that a set S of r distinet elements is organized as rn sorted 
sets Ci of size 0{r/m) eaeh. For every i = 1, 2, m, let be the subset 
of Ci that eonsists of every s-th element of Ci (i.e., the s-th, [2s)-th, (Zs)-th, 
elements of Ci). Let = [j^iC-. If w (resp., z) is the a-th (resp., fl-th) 
smallest element of , with w < z, then there are at most s{f3 — a -\- m — 1) 
elements of S that are between w and z (i.e., these elements are > w but < z). 

Proof. See the proof of Lemma 1 in [3] . 

Now from the n — 1 sets ERRi, ERR2, . . . , ERRn-i of size 0(n) each, we 
need to choose 0(n) sample elements for the binary search process. Our basic 
idea for the sampling is as follows: 

Partition the n — 1 sets ERRk into ^/n groups Ci of (roughly) ^/n sets 
each. Treat each group Ci (of size 0(n^'^)) as one single sorted set and 
select O(y^) sample elements from Ci, such that there are 0(n) elements 
of Ci between every two consecutive samples from Ci. The total number 
of samples so selected from all the groups is 0(n). 
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Note that, since we use only 0(n) space, we cannot explicitly store a group Gi 
for the sampling process. Instead, we sample from the ^/n sets of Gi with the 
following procedure: 

Procedure Sampling 

1 For every set EBRk of Gi^ first compute EBRk and sort it; then select every 

(v^)-th element from the sorted set EERj.^ and put the selected elements in 
the set Si for Gi, 

2 Sort the set and choose every (y^)-th element from Si, These chosen 

elements form the samples of and are put into the set Sample{Gi), 

The total time of the above procedure is O(n^'^logn), since we need to com- 
pute and sort each of the ^/n sets EBRk of Gi, The space is 0(n) , because we only 
need to store each ERRk once in Step 1 , and store the set Si of size 0 (n). The 
size of Sarnple{Gi) is clearly 0[^Jn), The quality of the samples in Sarnple{Gi) 
is ensured by the following lemma. 

Lemma 3. There are 0{n) elements of Gi between every two eonseeutive sam- 
ples in Sarnple[Gi) . 

To avoid unnecessary repetitions on describing our solutions to various 2- 
D min-e problems, we assume below that we will be solving an “abstract” 2-D 
min-e problem. Let imin-# denote the time bound of an appropriate 2-D min-^^^ 
algorithm (note that each of our 2-D min-^^^ algorithms uses 0(n) space). This 
“abstract” 2-D min-e algorithm consists of three stages. 

The First Stage 

In the first stage, we perform Procedure Sampling on each of the ^/n 
groups Gi and obtain 0 (n) samples, in altogether O(n^logn) time and 0 (n) 
space. Let Sample = Uf^-^Sample[Gi), We then perform binary search on the 
sorted set Sample^ at each step of the search using the min- 7 )^ algorithm on 
F and a certain approximation error of Sample, This binary search takes 
time. Therefore, this stage uses 0(n^logn T 7kiin-#iog^) time 

and 0 (n) space. 

At the end of the binary search of the first stage on the set Sample^ we 
obtain two values a and 6 that are two consecutive errors in Sample^ such that 
the sought error e which is the solution to the min-e problem satisfies a < e <b. 
Then, the elements of L-ERR{F) U V-ERR{F) between a and b are regarded as 
eurrently aetive^ and all other elements are not. The set of the currently active 
elements (i.e., between a and b) is characterized by the following lemma. 

Lemma 4. There are 0(n^’^) elements of L-ERR{P) U V-ERR{P) between a 
and b. 

The Second Stage 

In the second stage, we first partition the 0(n^'^) currently active elements 
into ^/n subsets G[ of size 0(n) each. The following steps carry out this parti- 
tioning. (1) Compute ERRk^ for every k with 1 < k < and find the number 
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of the currently active elements in ERRk (by comparing the elements of ERRk 
with a and 6 ). (2) Partition the currently active elements of the ERRk^s into ^/n 
subsets by associating each with several appropriate ERtiGs] this is done 
by a simple prefix sum computation on the numbers of the currently active ele- 
ments in ERtii^ ERR 2 y • • • , ERtin-i. This partitioning process takes 0(n^ log n) 
time and 0 (n) space (the costly time is on re-computing the EBRGs). 

Next, we compute each set G[ from its associated ERRGs^ sort and select 
as a sample every (y^)-th element from G[. Note that \G[\ = 0(n) and there 
are ^/n T 1 elements of G[ between every two consecutive samples from G [ . Let 
Sample' be the set of such selected samples from all the ^/n G[^s. Then \ Sample'\ 
— 0(n). Performing binary search on Sample' again gives two consecutive errors 
a' and b' in Sample' such that a' < e <b' . The binary search of the second stage 
has the same complexity bounds as the first stage. Based on Lemma 2, it is 
now not hard to see that between a' and b' ^ there are 0 (n) (currently active) 
elements oi L-ERR{F) U V-ERR{F), 

The Third Stage 

The third stage computes the 0(n) currently active elements from the ERtik ’s, 
and simply performs binary search on these 0(n) elements. This binary search 
obtains the sought error e. The third stage is again carried out in the same 
complexity bounds as the first stage. 

In summary, the above “abstract” 2-D min-e algorithm runs in 0(n^logn T 
time and uses 0 (n) space. 

Lemma 5 . The 2 -D min-e problem under the tolerance zone criterion can be 
solved in 0 {rS\ogn) time and 0 {n) space. 

A similar result holds under the infinite beam criterion, which we leave to 
the full paper. 

4 Min-^ and Min-e Algorithms with Li and L^o Metrics 

In this section, we sketch our algorithms for the 2-D min- 7 )^ and min-e problems 
under the tolerance zone and infinite beam criteria that are based on the Li 
and Loo metrics. On one hand, the algorithms with the Li and Loo metrics 
are quite similar to those with the L2 metric that were described in Sections 
2, and 3. On the other hand, the L\ and Loo versions of the problems have 
different geometric structures that can be exploited by our algorithms. In fact, 
some of these structures enable us to obtain more efficient solutions for several 
Li and Loo problems than their L2 counterparts in the previous sections. For 
these reasons, our discussions will be focusing on the differences between the L\ 
and Loo algorithms and their L2 counterparts. 

The 2-D min- 7 )^ and min-e problems under the infinite beam criterion using 
the Li and Loo metrics were considered in [4]. It has been shown in [4] that 
a main difference between the Li and Loo problems and the L2 ones is at the 
different shapes of the error tolerance regions. While the 2-D L2 error tolerance 
region of a point is a disc in the plane, the 2 -D Loo (resp., L\) error tolerance 
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region of a point is a square (resp., diamond). Our algorithms solve these 2-D 
Li and Loo problems in the same time bounds as those in [4], but using only 
0(n) space. Our 2-D L\ and Loo algorithms are much like their L 2 counterparts 
in Section 3. 

Similarly, we solve the 2-D min-# (resp., min-e) problems under the tolerance 
zone criterion using the L\ and Loo metrics in O(n^) (resp., O(n^logn)) time and 
0(n) space. Although we are not aware of any previous algorithms specifically for 
these 2-D L\ and Loo problems, one could certainly use the 2-D min-# (resp., 
min-e) techniques with the L 2 metric in [2] to solve these problems in O(n^) 
(resp., O(n^logn)) time and O(n^) space. 
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Abstract. We present efficient deterministic parallel algorithmic tech- 
niques for solving geometric problems in BSP like coarse-grain network 
models. Our coarse-grain network techniques seek to achieve scalability 
and minimization of both the communication time and local computation 
time. These techniques enable us to solve a number of geometric problems 
in the plane, such as computing the visibility of non-intersecting line seg- 
ments, computing the convex hull, visibility, and dominating maxima of 
a simple polygon, two- variable linear programming, determination of the 
monotonicity of a simple polygon, computing the kernel of a simple poly- 
gon, etc. Our coarse-grain algorithms represent theoretical improvement 
over previously known results, and take into consideration additional 
practical features of coarse-grain network computation. 

1 Introduction 

A coarse-grain parallel computer has a relatively small number of processors (the 
number of processors may range from several to a few thousands) . Each proces- 
sor of a coarse-grain parallel computer, usually a state-of-the-art processor in 
itself, has fairly sophisticated computing power and a quite large local memory; 
hence the processor can store a large amount of data and perform considerably 
complicated computation by itself. The processors are connected together with 
an interconnection network (e.g., a hypercube, mesh, or fat tree architecture). 
A processor can access data stored in a non-Iocal memory by communicating 
with other processors via the network. This class of machines represents a main 
stream of today’s general-purpose parallel computers that are marketed com- 
mercially (e.g., nCUBE, KSR, Intel Paragon, Intel iPSC/860, CM-5, Cray T3E, 
SP-1 and SP-2), and have been used in various applications. 

Clearly, the coarse-grain network models are more practical than the fine- 
grain models (it is assumed that a fine-grain processor has only 0(1) local mem- 
ory but the number of such processors in a parallel machine can be as large as 
needed). However, there are additional obstacles to designing efficient algorithms 
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on coarse-grain networks. A very crucial one is the communication bottleneck. 
This is because more data items are likely to be exchanged among processors 
but there are less communication links available in a (moderate-size) coarse-grain 
network. Besides, the theoretical assumption that both the inter-processor com- 
munication operations and local computation operations take 0(1) time does not 
hold very well for current coarse-grain computers. On many commercial coarse- 
grain parallel machines today, communication operations are in general much 
more time-consuming than local computation operations (typically, the time is 
an order of magnitude more). For example, on a 64-processor nCUBE 2 super- 
computer, an addition operation performed on local data takes 0.5 microseconds, 
while an operation for transferring a data item between two neighboring pro- 
cessors takes about 200 microseconds [16], 400 times of that for the addition 
operation. Therefore, it makes sense to distinguish two kinds of time complex- 
ity, one for local computation and the other for communication, and to seek to 
minimize each of them. 

Not only we target minimization of both the local computation time and com- 
munication time, also we aim to obtain coarse-grain network algorithms that are 
scalable (i.e., the algorithmic efficiency is achievable on a wide range of coarse- 
grain machines with various ratios of problem size to number of processors). 
The 1992 “Grand Challenges” report [13] listed as a major goal in research the 
design of scalable parallel algorithms for application problems. In this paper, 
we focus on developing efficient deterministic algorithmic techniques for solving 
computational geometry problems in coarse-grain network models. 

In the last several years, there has been considerable work on developing 
coarse-grain algorithms for geometric problems. These coarse-grain geometric al- 
gorithms are typically based on Valiant’s “bulk synchronous” processing (BSP) 
model [19,20] or some of its variations. In the BSP model, a problem of size n is 
stored evenly in ap-processor parallel computer (each processor contains 0(n/p) 
data). A parallel algorithm based on this model consists of a sequence of super- 
steps (or eommunieation rounds). In each superstep or round, every processor 
can send/receive messages of size 0[n/p) to other processors, and perform com- 
putation on its local data. Typical communication operations performed in each 
round are global sort [11], all-to-all broadcast [7], personalized all-to-all 
broadcast [7], partial sum (scan) [7]. The goal is to minimize the number of 
communication rounds as well as the local computation time taken by the algo- 
rithm. If the best known sequential algorithm for a problem of size n takes Ts{n) 
time, then ideally one would like to obtain a parallel algorithm in the BSP model 
using 0(1) communication rounds and 0[Ts{n) /p) local computation time. 

For the problems studied in this paper, there have been interesting paral- 
lel geometric algorithms in BSP like models. In particular, efficient algorithms 
have been known for computing 2-D convex hull [4,6,8,9,10,21] and 3-D con- 
vex hull [5,6,12], dominating maxima [4,6,7,12,18], and visibility of planar non- 
intersecting line segments from a point [6,7,18]. All these algorithms rely on 
sorting 0(n) input items and using messages of size 0[n/p). Note that recently, 
Goodrich [11] developed an efficient sorting algorithm in the BSP model with 
optimal local computation time and communication rounds. 
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We use a variation of the BSP model in which the p processors of a parallel 
computer are connected by a communication network (e.g., mesh, hypercube, or 
fat tree). This model is called the coarse grained multicomputer (CGM) in [7]. 
We assume that each processor stores 0[n/p) data items for a problem of size n 
and that n/p is sufficiently large. In contrast to previous algorithms in BSP like 
models which usually use messages of size 0(n/p), we further assume that it takes 
less time to send “short” messages of size 0((n/p)“), for some constant a with 
0 < q; < 1, than “long” messages of size 0{n/p). This assumption is reasonable 
because on current parallel computers, the time for sending a message typically 
consists of a startup overhead time and a time depending on the message length. 
When n/p is very large, sending messages of size 0[n/p) can become expensive 
because of their lengths. On the other hand, if one only sends/receives short 
messages of size o(n/p), then massive global data movement operations such 
as global sorting would require many communication rounds (instead of the 
desirable 0(1) rounds). Therefore, an efficient algorithm in the model we use 
should try to solve the target problem by avoiding as much as possible using long 
messages while in the same time using as few communication rounds as possible. 
Specifically, this means that one should try to avoid performing massive global 
data movement operations such as sorting 0(n) data items unless necessary. 
Also, this means that one should try to solve problems by exchanging only o(n) 
(instead of 0(n)) data among the p processors. 

We present efficient deterministic co arse-grain network algorithms for solving 
several important geometric problems in the plane, such as computing the con- 
vex hull, visibility, and dominating maxima of a simple polygon, two-variable 
linear programming, determination of the monotonicity of a simple polygon, 
computing the kernel of a simple polygon, etc. Most of these algorithms take 
0(1) communication rounds and optimal local computation time, yet are able 
to avoid global sorting and to use only “short” messages of size 0((n/p)“), where 
q; is a constant with 0 < o; < 1. Note that using the algorithms in [7] to solve 
some of the problems we study would be less efficient (requiring massive data 
movement). As far as the scalability is concerned, our algorithms require that 
n/p > p^ for any constant e with 0 < e < 1, in comparison to that of n/p > p 
in [7]. We also give an efficient algorithm for computing the visibility of planar 
non-intersecting line segments from a point. This algorithm improves the scala- 
bility of the previously best known algorithm in [7] from n/p>p to n/p>p^^ 
while still using 0(1) communication rounds and optimal local computation time 
(but, by using long messages as in [7]). Our visibility algorithm is in fact quite 
different from that in [7]. 

2 Visibility of a Simple Polygon from a Point 

An important case of the visibility problem studied in Section 3 is that when 
the n line segments in S form the boundary of a simple polygon F (i.e., every 
one of the n segments shares each of its two endpoints with exactly one other 
segment and has no other intersection with any other segments). In his book 
on art gallery problems and algorithms [17], O’Rourke argues that this case is 
perhaps the most fundamental problem in visibility. 
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Fig. 1. The -< order of a visibility chain need not be consistent with the chain 
order Ac along C, 

We are not aware of any previous BSP like algorithm specifically for com- 
puting the visibility of a simple polygon. Although one could use the CGM 
visibility algorithm in [7] to solve the polygon case, doing so would require mas- 
sive data movement such as global sorting. In this section, we give an efficient 
algorithm for this problem that uses 0(1) communication rounds and optimal 
local computation time. Furthermore, this algorithm only uses messages of size 
0((n/p)“), where o; is a fixed constant with 0 < o; < 1, thus avoiding expensive 
communication operations like global sorting. 

Let P be a simple polygon bounded by a closed polygonal chain O of n 
vertices Pi, ^ Pm fhe clockwise order. WLOG, let the source point q 

be at (0, cx)) (the general case can be handled in a similar fashion, as shown in 
[2]). We assume that the input chain C is given sorted by the chain order Ac? 
i.e., the order in which the vertices appear along C . The order Ac is described 
implicitly by the way in which the elements (e.g., the vertices and edges) of C 
are initially stored in the processors PPi, Pp 2 , • • • , PPp- 

— Every element in processor FEi is Ac every element in 

— The n/p elements in each FEi are in the sorted order of Ac- 

The output to be produced is a visibility chain of C, which we denote by F7A(C), 
consisting of a subset of C based on some sorted order A that is different from 
Ac- E.g., for a finite source point g, A is the sorted order of the vertices of 
VIS{C) according to their polar angles with respect to q. When q is at (0,cx)), A 
is simply the < order of the ^-coordinates of VIS{C). Hence, the desired output 
is the set of < n vertices of E7A(C), stored in processors FEi^FE 2 y - - - , FEp 
with each FEi containing 0[n/p) such vertices, in the sorted order of A. 

One may point out that along the final visibility chain E7A(C), the A and 
Ac orders are consistent with each other. This is indeed true for VIS[C) if C is a 
closed chain. However, since our algorithm uses a divide and conquer strategy, we 
must cut the closed chain C into many open chains, and the A and Ac orders on 
the visibility chain of an open chain need not be consistent with each other. 
Figure 1 illustrates that < is quite different from Ac- The vertices of VIS[C^) in 
this example are in the following order which is clearly different from Ac^ Pis? 
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Pl2j P9y PSj P7y P5y PS y Ply P2y P^y P6y PlO y Pll y Pl4y Pl5 y Pl8 y Pl9 y P22 y P24y P26 y P28y 
PSOy P29y P27y P25y P23 y P21 y P20 y Pl7 y Pl6 - 

The known optimal parallel fine-grain algorithms for this problem (e.g., on 
the PRAM [2] or hypercube [1]) explicitly maintain the vertices of VIS[C')^ for 
every open chain used in such an algorithm, in the sorted order of A. This 
sorted order of VIS[C^) is needed because some of the key operations in such 
algorithms are parallel k-oxj searches on VIS[C^) (where /^ > 2 is an integer pa- 
rameter chosen by the algorithms). Although maintaining VIS[C^) in the sorted 
order of -< can be done efficiently in the fine-grain models (by exploiting some 
useful properties of this visibility problem and certain network structures), doing 
so in our coarse-grain model would mean massive data movement, which we seek 
to avoid as much as possible. In fact, we want to send only short messages of 
size 0((n/p)“), for a fixed positive constant o; < 1. Therefore, a difficulty to our 
coarse-grain algorithm is to avoid, especially at recursion levels below the top 
one, explicitly maintaining visibility chains in the sorted order of A, while still 
being able to perform parallel A:-ary searches correctly and efficiently. We will 
make crucial use of the geometry of this problem, especially the fact that know- 
ing the order Ac enables one to obtain significant information about VIS[C') in 
an almost sorted A order. 

Let g = min{(n/p)"/^,p} be a control parameter of our divide and conquer 
algorithm. Let the p processors PEi^ PE 2 ^ . . . , PEp form a group G, with each 
PEi storing an (^)-vertex contiguous subchain of C. We henceforth let the order 
A be the order < of the ^-coordinates. 

Algorithm Polygon-Visibility(G) 

1 . If the group G consists of one processor PE: Let PE compute LTA(G^), 
where is the subchain of C stored in PE. This computation is done by 
PE by simply using the sequential algorithm in [15]. After this computation 
is finished, return. 

2. Otherwise: Let the group G consist of m > 1 processors Qi? Q 2 y • • • , Qm^ 
The following steps are carried out. 

(2.1) Partition G into g subgroups Gi, G 27 . . . , G^, with each group Gi con- 
sisting of processors Qm{i-i) , , , Qm{i-i) , o, • • • , Qiai- Let Gi be the polygonal 

i ~9 , 9 

chain which is the concatenation of the chains stored in the processors of Gi. 

(2.2) In parallel for i = 1 , 2 , ..., recursively call Algorithm Polygon- 
Visibility(G^) to compute VIS[Gi). 

(2.3) Combine the subsolutions from the g recursive calls to obtain L7 A(Gg), 
where Gq is the chain which is the concatenation of all the G^s. 

The keys are on resolving appropriately two related issues: (1) How to rep- 
resent VIS[Gi) without having to explicitly maintaining VIS{Ci) in the sorted 
order of <, and (2) how to perform efficiently parallel searches on VIS{Ci) using 
such a representation. Our ideas for handling these two issues are as follows. 

1. Partition the portions of VIS{Ci) which are stored in a processor PE of Gi 
into G(l) connected components (this is possible by Lemma 4.1 of [2]). 
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2. Select a sorted set of 0(m/^) vertices of VIS[Ci) in the < order to rep- 
resent the sorted order of VIS[Ci) and to guide the parallel searches into 
appropriate portions of individual processors. 

Our algorithm makes use of many geometric structures of this polygon vis- 
ibility problem, some of which were given in [1,2]. We must omit many details 
due to the space limit. 

3 Visibility of Non-intersecting Line Segments from a 
Point 

Given a set S' of n non-intersecting “opaque” line segments si, S 2 , . . . , and a 
source point q in the plane, the problem is to compute the (possibly unbounded) 
region of the plane that is visible from the point q. Without loss of generality 
(WLOG), we assume that the point q is at (0,cx)) (the algorithm for the gen- 
eral case is similar). This case of the visibility problem is also called the upper 
envelope problem. This problem is solvable sequentially in 0(n log n) time [14]. 

In [7], an efficient CGM algorithm was presented for the upper envelope prob- 
lem that uses 0(1) communication rounds and 0( ^^^^^ ) optimal local compu- 
tation time, provided that n/p > p, with each round sending/receiving messages 
of size 0(n/p). It was posed in [7] as an open problem to improve the scalability 
from n/p > p. In this section, we present a different algorithm than that in [7], 
improving the scalability from n/p > p to n/p > p^ for any small constant e > 0, 
while still using 0(1) communication rounds and 0( ^^^^^ ) local computation 
time, in the same CGM model as [7]. Our algorithm is based on the observations 
of Bertolazzi, Salza, and Guerra [3] for solving this visibility problem. 

Let V ert[S) denote the end vertices of the segments in S. Then the size of 
yert(S'), |yert(S')|, is 2n. WLOG, we assume that no two points in V ert[S) 
have the same x-coordinate. The upper envelope UE[S) of S', consisting of the 
visible portions of the segments of S from the point q = (0, cx>) , is specified by 
a sequence of its vertices, in sorted order of their x-coordinates. Each vertex p 
of UE[S) can be characterized as one of the following three types. 

L p G Vert[S) and p is visible from (0, cx)). That is, there is a vertical ray going 
downwards from (0,cx)) that hitsp before it intersects any other points on 
the segments of S. 

2. p is on a segment s(p) of S', p ^ Lert(S), and there is a vertical ray going 
downwards from (0, cx>) that hits p right after it passes through for the first 
time an end vertex v of another segment of S such that v is visible from 
(0, CX)) (i.e., the interior of the line segment vp intersects no segments of S). 
We call in this case the segment s(p) the projection segment of v (since a 
light source at (0, (x>) projects v onto s(p) at p). 

3. p is not on any segment of S but is on a horizontal line p = — cx), and there 
is a vertical ray going downwards from (0, (x>) that hits p after it passes 
through for the first time an end vertex v of another segment of S such that 
V is visible from (0, cx). In this case, we assume that p is on a special line 
segment s* and s* is the default projection segment of v. 
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For a segment set B C we say that a point p is visible from q with respect 
to B if there is a vertical ray going downwards from (0, cx>) that intersects no 
other segment of before hitting p. For segment sets A and B with A C B C S ^ 
we denote by VISb{A) the set of vertices in V ert[A) that are visible from q with 
respect to Note that UE[S) is completely described by the vertices in VIS s{S) 
and their projection segments, in sorted order of the ^-coordinates. Hence, our 
problem becomes one of computing VISs{S)^ as well as their projection segments, 
in sorted order. 

Let R be the vertical region bounded by two vertical lines Li{R) and Lr{R), 
with Li{R) to the left and Lr{R) to the right. Then a segment of A, if it intersects 
R^ may or may not have an end vertex contained in R. We call the segments of 
S that have an end vertex in R the internal segments of it, denoted by A/(it), 
and the segments of S that intersect it but do not have any end vertices in it the 
crossing segments of it, denoted by Aa(it). Since the segments of S do not cross 
each other at an interior point, only the highest crossing segment of it, which has 
the highest intersection point with the bounding lines Li{R) and Lr{R) of it, 
may have portions in it that are visible from q. Clearly, all the visible vertices of 
V ert[S) in it, VISs{S) Pi it, are vertices of Ai(it), and the projection segments 
of these visible vertices in R are either some of those in Sj[R) or the highest 
crossing segment in Sc{R)^ For a vertex v of V ert(A/(it)) such that n is in it and 
V is visible from q with respect to Sj[R) (i.e., v G VIS spR){Sj{R)))^ we define 
the projection segment of v in R as the first segment among those in Sj (it) that 
is hit by a downward vertical ray from v. 

The following observations, given in [3], are a key to our algorithm. 

Lemma 1. Let it and R he two vertical regions with R C R\ Let v be a vertex 
of Vert{Sj{R)) such that v is in R and v is visible from q with respect to Sj{R). 
Then the following are true: 

— The vertex v is visible from q with respect to Sj{R^) if and only if v is visible 
from q with respect to the highest crossing segment of R in Sj{R^). 

— The projection segment of v in R^ is either (1) the projection segment of v 
in R, (2) the highest crossing segment of R in Si[R'), or (3) the default 
projection segment s*. 

We are now ready to present our divide and conquer algorithm. Let g = 
min{n/p,p} [g is a control parameter for our divide and conquer strategy). 

Algorithm Upper-Envelope-Main( A, n) 

(1) Sort the 2n end vertices of V ert[S) by increasing ^-coordinates. Use p — 1 
vertical lines Li, L 2 , . . . , Lp_i to partition the plane into p vertical regions 
iti, it 2 , . . . ^ Rp^ each of which contains 2n/p vertices of Vert[S). Fori =1,2, 
. . . , p, let processor PEi store the 2n/p vertices of Vert[S) Pi Ri as well as 
all the internal segments of Ri. (Note that a segment of S can be stored in 
two processors, but this does not matter.) 

(2) Let the p processors PEi^ UU 2 , . . . , PEp form a group G, and call Proce- 
dure Upper-Envelope (G) to compute the upper envelope of S. 
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Procedure Upper-Envelope (C') 

1. If the group G consists of one processor FE: Let FE compute the upper 
envelope in the region R with respect to Sj[R) (i.e., the set of the internal 
segments of R)^ where R is the vertical region with which the processor FE is 
associated. That is, compute the vertices in VIS Sj(r){Si{R)) Pi R and their 
projection segments in R. This computation is done by FE by simply using 
the sequential algorithm in [14]. After this computation is finished, return. 

2. Otherwise: Let the group G consist of m > 1 processors Qi? Q 27 • • • , Qm- 
The following steps are carried out. 

(2.1) Partition G into g subgroups Ui, U 27 Gg^ with each group Gi 
consisting of processors Q m(i-i) . , Q m(i-i) 0 , . . . , Qrn±. Let R{Gi) be the 

g ' g ' 3 

vertical region which is the union of all the vertical regions associated with 
the processors of Gi, 

(2.2) In parallel for i = 1, 2, . . . , ^, recursively call Procedure Upper- 
Envelope (U^) to compute the upper envelope in the region R{Gi) with 
respect to the set Sj[R{Gi)) of the internal segments of R[Gi), 

(2.3) Combine the subsolutions from the ^ recursive calls to obtain the upper 
envelope in the region R[G) = Uf^-^R[Gi) with respect to the set of the 
internal segments of R[G), 

Suppose that Procedure Upper-Envelope (U) performs its computation 
correctly (particularly in step (2.3)), then at the top level of the recursion, it 
finds the upper envelope of S. This is because at the top recursion level, the 
region associated with all the p processors is the whole plane and certainly all 
the segments of S are internal segments of the whole plane. 

In the rest of this section, we discuss the computation of Procedure Upper- 
Envelope(U). (In fact, we only need to discuss step (2.3), which we call the 
Combining Step.) We also analyze the number of communication rounds and 
local computation time taken by Algorithm Upper-Envelope-Main(A, n). 

Note that based on Lemma 1, to obtain visibility information for a vertical 
region R' from that of another vertical region R with R C R' ^ it is sufficient to 
identify the highest crossing segment of R in Si[R^). In the case of step (2.3), 
once the highest crossing segment hi of R{Gi) in Sj[R{G)) is identified for every 
R{Gi)^ we are done. This is because we then only need to check every vertex 
V G VIS bl R{Gi) against hi to see whether v belongs to 
VIS spR(G)){^'i{-^{^))) bl R{G) and (if so) whether the projection segment of v 
in R[G) should become hi. Hence, we only need to show how to compute the 
highest crossing segment hi of R{Gi) in Sj[R{G)) for every R[Gi). 

The Combining Step consists of the following substeps, performed in par- 
allel for every Gi. 

Substep (i) Let Li[R{Gi)) (resp., Lr{R{Gi))) be the left (resp., right) ver- 
tical bounding line of the region R[Gi). Let LSi (resp., RSi) be the segments of 
S that intersect Li{R{Gi)) (resp., Lr{R{Gi))) and have their right (resp., left) 
end vertices in R{Gi). Then LSi L> RSi is a subset of the internal segment set 
Sj{R{Gi)) of R{Gi). Let A(LS'^) (resp., X{RSi)) be the set of the ^-coordinates 
of the left (resp., right) end vertices of the segments in LSi (resp., RSi). Sort 
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among the processors of Gi the segments in LSi (resp., BSj) based on the de- 
creasing (resp., increasing) order of X[LSi) (resp., X[ESi)), 

Substep (ii) For each segment ta in the sorted set LSi (resp., BSi)^ associate 
with ta its intersection point with the left (resp., right) bounding line Li{R{Gi)) 
(resp., Lr{R{Gi))) of R[Gi). Then, perform a partial sum operation among the 
processors of Gi to find, for every ta in LSi (resp., RSi)^ the segment R in LSi 
(resp., LtSi) such that among all segments ta^ in LSi (resp., LtSi) with > a, R 
has the highest intersection with Li{R{Gi)) (resp., Lr{R{Gi))). Let z{ta) = R, 
Remark: Let R{Gj) be a vertical region to the left of R[Gi). Let xi{R) be 
the x-coordinate of the left end vertex of a segment ta G LSi. For the left vertical 
bounding line Li[R{Gj)) of R[Gj)^ if Li[R{Gj)) is between xi{ta) and xi{ta~i) 
(with xi{to) being equal to the x-coordinate of Li[R{Gi)))^ then it is clear that 
z{R) is the highest crossing segment of R{Gj) in Sj[R{Gi)). 

Substep (iii) Compute the highest crossing segment R of R{Gi) in Sj[R{G)): 

1. Send fromG^ its left (resp., right) bounding Ime Li[R{Gi)) (resp., Lr{R{Gi))) 
to every processor of each group Gk such that R{Gk) is to the right (resp., 
left) of R[Gi). This is done by an all-to-all broadcast in G. (Hence, each 
processor of Gi receives 0[g) bounding lines from other groups Gj.) 

2. The processors in Gi find, for each left bounding line Li{R{Gj)) with j < i, 

the segment R G LSi such that xi{ta) < ^ ^i{R-i)^ This is 

done in each processor RE of Gi by a binary search for x{Li[R{Gj))) in the 
portion of X[LSi) stored in RE. Perform a similar computation in Gi for 
each Lr{R{Gk)) with k > i. 

3. Send from Gi the segment z{R) to the corresponding group Gj with j ^ iy 
such that z{ta) was found in the previous step specifically for Gj. This is 
done by a certain personalized all-to-all broadcast in G. (Hence, Gi receives 
altogether 0[g) crossing segments from other Gj^s.) 

4. The processors of Gi decide the highest crossing segment R of R{Gi) in 
Si{R{G)). 

The correctness of the Combining Step follows from the remark after Sub- 
step (ii), and the correctness of Algorithm Upper-Envelope-Main(A, n) fol- 
lows from Lemma 1. It is not hard to show that Algorithm Upp er- Envelop e- 
Main(A, n) uses 0(1) communication rounds and 0( ^^"^^^ ) local computation 
time. To see that the scalability of the algorithm is n/p > observe that the 
algorithm only needs to use messages of length 0(n/pT g) = 0(n/p), because 
g = min{n/p,p} and hence g < n/p. 

4 Other Geometric Problems 

Once the polygon visibility problem is solved, efficient coarse- grain algorithms 
can be obtained for several other polygon problems, including computing the 
convex hull and dominating maxima of a simple polygon, determination of the 
monotonicity of a simple polygon, computing the kernel of a simple polygon, 
etc. The details of these algorithms will be given in the full paper. 
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Abstract. In this paper, we generalize the Ski- Rental Problem to the 
Bahncard Problem which is an online problem of practical relevance for 
all travelers. The Bahncard is a railway pass of the Deutsche Bundesbahn 
(the German railway company) which entitles its holder to a 50% price 
reduction on nearly all train tickets. It costs 240 DM, and it is valid for 
12 months. Similar bus or railway passes can be found in many other 
countries. 

For the common traveler, the decision at which time to buy a Bahn- 
card is a typical online problem, because she usually does not know 
when and to which place she will travel next. We show that the greedy 
algorithm applied by most travelers and clerks at ticket offices is not 
better in the worst case than the trivial algorithm which never buys a 
Bahncard. We present two optimal deterministic online algorithms, an 
optimistic one and and a pessimistic one. We further give a lower bound 
for randomized online algorithms and present an algorithm which we 
conjecture to be optimal; a proof of the conjecture is given for a special 
case of the problem. 

1 Introduction 



In the Ski-Rental Problem (SRP) [9, p. 113], a sportsman can either rent a pair 
of skis for 1 DM^ a day, or buy a pair of skis for N DM. As long as he has 
not bought his skis, he must decide before each trip whether to buy the skis 
this time or to wait until the next trip (which might never come). The SRP 
can be solved by algorithms for the page replication problem on two nodes A 
and B with distance 1 and replication cost N (initially, the file sits on node 
A, and all requests are to node i^), or the two-server problem on a triangle 
with side lengths (1,7V, TV) (nodes A and B have distance 1; initially, the two 
servers sit on nodes A and C, and the requests alternate between A and C). For 
the page replication problem, there are optimal 2 -competitive deterministic [4] 

r n _ 1 -comp et it i ve randomized algorithms against an oblivious adver- 

sary [1,8]. A similar bound was obtained by Karlin et al. [7] for the problem of 
two servers on a (1, A, A)-triangle. 

In this paper, we consider the Bahncard Problem which contains the SRP as a 
special case (another generalization of the SRP was given in [2]). The Bahncard 
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is a railway pass of the Deutsche Bundesbahn (the German railway company). It 
costs 240 DM, and it is valid for 12 months. Within this period, a traveler can buy 
train tickets for half of the regular price. Looking back at her travel schedule 
of the last few years, a traveler can easily determine when several expensive 
trips had been sufficiently close together to justify the additional expense of a 
Bahncard. Unfortunately, at any given time the traveler cannot see far into the 
future, so her decision when to buy a Bahncard is made with a high degree of 
uncertainty. 

Let BU(C, /?,T) denote the {C ^ j3 ^T)- Bahncard Problem^ where a Bahncard 
costs C, reduces any ticket price p to -p, and is valid for time T. For example, 
the German Bahncard Problem GBP is BP(240DM, 1 year ), and the SRP is 

BP(yV,0, cx)) with the additional constraint that each ticket costs 1 DM. 

The SRP and the Bahncard Problem are online problems^ i.e., all decisions 
must be made without any knowledge of the future. The quality of an online 
algorithm is measured by the ratio of its performance and the performance of 
an optimal offline algorithm with full knowledge of the future. The supremum 
of this ratio over all possible travel request sequences is called the competitive 
ratio of the online algorithm; the smaller the competitive ratio, the better the 
algorithm [3,6,11]. 

We show that no deterministic online algorithm for BP(C, /?, T) can be better 
than (2 — /?) -competitive. This lower bound is achieved by SUM, a natural gener- 
alization of the optimal deterministic 2 -competitive Ski- Rental algorithm. SUM is 
pessimistic about the future in the sense that it always buys at the latest possible 
time. Surprisingly, there is another optimal deterministic algorithm, OSUM, which 
usually buys much earlier than the pessimistic SUM (in fact, it buys at the earli- 
est possible time). This gives the rare chance of combining competitive analysis 
with probabilistic analysis : A traveler with a low travel frequency should use 
the pessimistic algorithm, whereas a frequent traveler should use the optimistic 
algorithm. Then both travelers will be happy in the worst case (because both 
algorithms achieve an optimal competitive ratio), and on the average (because 
the pessimistic algorithm tries to avoid buying, in contrast to the optimistic 
algorithm). 

Since an online algorithm must make its decisions in a state of uncertainty 
about future events, it seems plausible that randomization should help the al- 
gorithm (because this may help to average between good and bad unpredictable 
future developments). Ben-David etal. [3] defined several models for random- 
ized competitive analysis and compared their relative strengths. In this paper, 
we assume an oblivious adversary. In this model, a request sequence is fixed 
in advance and the competitive ratio of a randomized algorithm is a random 
variable, only dependent on the random moves of the algorithm. 

We show that randomized variants of SUM and OSUM are -competitive 
against an oblivious adversary. This beats the deterministic lower bound for 
fd G (0,1), but it does not reach the lower bound of which we show 

to hold for any randomized algorithm. We give a randomized algorithm which 
achieves this bound in the case oi T = cx), i.e., a Bahncard never expires (in 
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this case, the Bahncard Problem corresponds to a variant of the SRP where the 
price for renting the skis includes the daily fee for the lift, which has to be paid 
additionally each day after buying the skis). We conjecture that the algorithm 
is also optimal for the more realistic case of T < oo. 

We note that introducing further “real life” restrictions like limiting the num- 
ber of trips or upper bounding ticket prices has no effect on the worst case be- 
haviour of the problem as long as a single one-way ticket can already be more 
expensive than a Bahncard. 

We apologize for not being able to give complete proofs for most of the 
theorems; they can be found in the full paper [5]. 



2 Definitions 

Let C > 0, T > 0, and f3 G [0,1] be fixed constants. The -Bahncard 

Problem (or shortly BP(C, /?, T)) is a request- answer game between an algorithm 
A (the traveler) and an adversary (real life). The adversary presents a finite 
sequence of travel requests a = a±a 2 • • Each is a pair (t^,p^), where > 0 
is the travel time and > 0 is the regular price of the ticket. The requests are 
presented in chronological order, i.e., 0 < ti < t 2 < • • *. 

The task of A is to react to each travel request by buying a ticket [that cannot 
be avoided), but A can also decide to first buy a Bahncard. A Bahncard bought 
at time t is valid during the time interval [t, t -\-T). A’s cost on is 

, , \ 0 ' Pi A has a valid Bahncard at time L 

Ca(^0 ={^ if 

I Pi otherwise. 

We call 0 • Pi the reduced price of the ticket. Accordingly, is a reduced 
request for A if A already had a valid Bahncard at tp otherwise it is a regular 
request. Note that A might buy a Bahncard at a regular request and then pay 
the reduced price for the ticket. 

If A buys Bahncards at times 0 < t± < • • • < Tk then we call the sequence 
TA(cf) = (ti, . . . , T/^) the B- schedule of A on a (since a is finite, the B-schedule is 
also finite). We denote the length k of the B-schedule by |lA(a)|. The total cost 
of A on a is then Ca(o') = |lA(cf)| * C + • We do not always distinguish 

clearly between an algorithm A and its B-schedule Ta, so c/-^(o') means the same 
as Ca((t), for example. 

Besides the total cost of A, we are interested in partial costs during some 
time interval /. Let p^{cr) = requests in /, and let 

^i{^) = Xlri money spent by A on tickets during I. We call 

1 cheap if p^ {a) < Ccrit^ where Ccrit = is the critical cost] otherwise, 1 is 
expensive. Ccrit is the break-even point for any algorithm. Buying a Bahncard 
at the beginning of an expensive interval saves money in comparison to paying 
the regular price for all tickets in 1. Observation 1 (1) below makes this more 
precise. 
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We are mainly interested in intervals of length T. The T -recent- cost (or T- 
cost for short) of a at time t is r^(t) = . The regular T-cost rr^it) of 

A on a at time t is the sum of all regular requests in (t — 'l\t] with respect to 
A’s B -schedule. Sometimes, we do not want the current request at time t to be 
included in the summation when computing r^(t) or rrj(t). Then we speak of 
the T—cost at instead. 

For any request sequence a there is a B-schedule Topt(o') of minimal cost 
CopT(cf)- In general, lopT(cr) is not unique and can only be computed by an ojfiine 
algorithm OPT which knows the entire sequence a in advance. 

Observation 1 Let a he a request sequence and lopT(cr) he an optimal B-schedule 
for a. Then we can assume w.Lo.g. that 

OPT never hugs a Bahncard at a reduced request. 

If I is an expensive time interval of length at most T then OPT has at least one 
reduced request in 1, □ 

An online algorithm A must compute its B-schedule lA(a) on the fly, i.e., 
whenever it receives a new request ai it must decide immediately if it wants to 
add ti to its B-schedule, without knowing future requests • • •• Once 

bought, a Bahncard cannot be reimbursed, so A cannot change its B-schedule 
later on. If A uses randomization then the cost of A on a fixed request sequence 
a is a random variable whose expected value is also denoted by CA(a). A is d- 
cornpetitive if c^[a) < d • Cqpt(o-) for all request sequences o. A is an optimal 
online algorithm if its competitive ratio d is the smallest possible among all 
online algorithms. If A is a randomized algorithm then this definition describes 
competitiveness against an oblivious adversary (see [3] for definitions of oblivious 
and adaptive adversaries and their respective strengths). Intuitively, an oblivious 
adversary must fix the request sequence a before A starts serving the requests. 
In contrast, an adaptive adversary can construct the request sequence step by 
step, dependent on previous decisions of A. This makes it more difficult for a 
randomized online algorithm to be competitive. However, we do not expect real 
life to behave like an adaptive adversary (ignoring Murphy’s Law), so we assume 
an oblivious adversary throughout this paper. 

3 An Optimal Offline Algorithm 

We note that Observation 1(1) does not imply that an optimal algorithm will 
buy a Bahncard whenever it reaches the first regular request of an expensive 
time interval. In Fig.l, both the intervals [0,7') and [7', 27') are expensive, but 
if e is small then the optimal algorithm would buy just one Bahncard at the 
second request. 

Theorem 2. Given n travel requests^ we can compute an optimal B-schedule 
and its minimal cost in time 0(n), 
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Fig. 1. Two expensive intervals, but an optimal algorithm buys at a 2 



Proof, Let a — (Ji • be a sequence of n travel requests. We construct a 
weighted acyclic directed graph with nodes s = 

where s = (0,0) and t = (t^ T '^’,0) are two new artificial requests. has the 
property that (s — t)-paths in G^ correspond to B-schedules, and any shortest 
(s — t)-path corresponds to an optimal B-schedule. 

For i = 0,...,n, there is an edge of weight pi^ and an edge 

ai of weight where is the first request after (or at) time p T T ^ 

and Qi is the accumulated cost of buying a Bahncard at request Ui and paying 
reduced ticket prices until this Bahncard expires, i.e., 

= C T P • Pj ^ 

j\ti<tj<ti+T 



Fig. 2 shows the graph G^ corresponding to the requests of Fig. 1. 



G -\- [3 ’ Ccrit 




Fig. 2. The graph G^ corresponding to the requests of Fig. 1 
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The edge weights as well as a shortest (s — t)-path can be computed in 
time 0(n) by scanning the nodes (Jq, . . . ,cr^+i in increasing order [12]. □ 



4 Deterministic Online Algorithms 



The Buy- Never- Algorithm NEVER which never buys a Bahncard is obviously 4 - 
competitive. Before we analyze other algorithms, we show a lower bound on the 
deterministic competitive ratio. 



Theorem 3. No deterministic online algorithm for /?,T) can he better 

than (2 — p)- competitive. 



Proof, Let A be an online algorithm for BP(C, /?, T). Let e > 0 be an arbitrarily 
small constant. As long as A has not got a Bahncard, the adversary continues 
showing requests of cost e (arbitrarily dense, so that all requests are in the inter- 
val [0,T)). If A wants to be better than ^-competitive, it must eventually buy a 
Bahncard. Then the adversary stops showing requests. Let s be the accumulated 
cost of the requests so far, not including the current request. Then 



Ca(s) = C s f3e and Cqpt(s) 



S T 6 

C T /3fs T e) 



if 



S T £ ^ 

S T £ A C(2rit 



Hence, 



caU) 



> 






= 2-p- 






2 — /? for £ ^ 0. 



Copt( 5J C0PT(CcWi — e) C 

The inequality holds because the quotient takes its minimum value at s = Ccrit~^ 
in both cases in the definition of Cqpt- □ 



Clerks at railway ticket offices usually advise their customers to buy a Bahn- 
card iff they are planning to buy one or more tickets of total cost at least Ccrit^ 
We call this the Ticket- Office- Algorithm TO A. It has the advantage of being mem- 
oryless (cf. [10]), however its competitive ratio is the same as that of NEVER : If 
the request sequence consists of many travel requests of cost slightly less than 
Ccrit within a short time interval, then TO A never buys a Bahncard, whereas the 
optimal algorithm would buy one at the first request. 

TOA seems to fail because it tries to handle expensive requests optimally but 
it cannot safeguard against a sequence of several cheap requests. To achieve a 
good performance for both types of request sequences we must allow for non- 
optimal behaviour in both cases. The proof of the lower bound indicates that the 
following Algorithm SUM might behave better than TOA. SUM buys a Bahncard at 
a regular request (t,p) iff — ^crit^ In the example of Fig. 1, SUM would 

buy a Bahncard at the second request (and thus incidentally behave optimally). 

Theorem 4. SUM is (2 — f])- competitive for BP(C, /?,T). 

Proof, (Sketch) Let a = oia 2 • • • be a request sequence and let Topt(<v) = 
(ti, . . . , T/^) be an optimal B-schedule for a. This divides time into epochs [r^, r^+i). 
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^ ^ j < ^7 where tq = 0 and r/^+i = oo. Each epoch (except for, possibly, the 
first and last one) starts with an expensive phase [rj^Tj-\-T)^ followed by a cheap 
phase [vj + 1 \ ) . 

SUM will buy at most one Bahncard during any epoch. This follows from 
Observation 1 ( 1 ) and the fact that (t — l\t] must be an expensive interval if 
SUM buys a Bahncard at time t. Therefore, we can upper bound SUM’s total cost 
of buying Bahncards by assuming that SUM spends C in every expensive phase, 
in addition to ticket costs. 

Clearly , Cguj^(a) < CQp^(a) for a cheap phase L So let 1 be any expensive 
phase. Let Csum and Cqpt denote SUM’s and OPT’s cost during i, respectively (in- 
cluding the cost of buying Bahncards). We divide I into three subphases Ii , i 2 , ^3 
(some of which can be empty); in ii and i 3 , SUM has a valid Bahncard, whereas 
it must pay regular prices in i 2 - Tor i G {1,2,3}, let Si = p^^ {a) be the total cost 
of requests in T. Then 

Csum ^ C -\- S 2 -\- • [si -\- and Cqpt = C T /?(si T S 2 + 53 ) . 

Hence 

Csum ^ C -\- ^ ^ 

— i O ^ P ^ 

Cqpt C T P • Ccrit 

because the first quotient is maximal if = S 3 = 0 and if S 2 is maximal, and 
the definition of SUM implies S 2 < Ccrit^ □ 

So SUM is optimal for the Bahncard Problem. In particular, it is | -competitive 
for the GBP. For the SRP, it behaves like the well-known optimal 2 -competitive 
algorithm which buys at the iV-th request [4]. 

However, SUM tends to be pessimistic about the future : It always buys at the 
latest possible time, namely after it has seen enough regular requests to know 
for sure that an optimal algorithm would already have bought a Bahncard. In 
contrast to that, we consider the OptirnAstic- Sum -Algorithm. OSUM which buys a 
Bahncard at a regular request (t,p) iff p > ^ where s = crgguj^(t“). 

Observe that OSUM will never buy its i-th Bahncard later than SUM (because 
OSUM buys when s reaches Ccrit): but often will buy earlier. Consider for example 
the GBP. Then OSUM buys a Bahncard whenever p > C — ^s. On the request 
sequence (Jun 22, 250 DM), (Jun 26, 100 DM), (Jul 17, 50 DM), (Jul 31, 200 DM), 
for example, OSUM would buy a Bahncard at the first request on Jun 22 and spend 
540 DM for all four tickets (which is optimal), whereas SUM would pay the regular 
price for the first three tickets and buy a Bahncard only at the fourth request 
on Jul 31, thus spending 740 DM. 

Of course, OSUM’s advantage over SUM shrinks if there are many cheap re- 
quests, and if all requests are infinitesimally small then OSUM converges to SUM. 
Nevertheless, OSUM should be used by frequent travelers who expect to buy more 
tickets in the near future, whereas SUM should be preferred by sporadic travelers 
with a low probability of traveling. The next theorem shows that OSUM is as 
optimal as SUM. 
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Theorem 5. OSUM is (2 — f])- competitive for BP{C^ 



Proof, (Sketch) Augmenting the proof of Theorem 4, we define a critical phase 
as an interval I = {ti^t2\^ where OSUM buys a Bahncard at time t 2 , ti is the 
maximum of t 2 — T and the expiration time of OSUM’s previously bought Bahn- 
card, and OPT has no valid Bahncard at any time in P Then we can charge 
the cost of each of OSUM’s Bahncards uniquely to either an expensive phase or a 
critical phase. A critical phase is induced by a request (t,p) with p > ^ 

a, where s = Hence 

Cqsum = C P s P f3p and Cqpt = s Pp . 



Therefore 

cqsum C P s P j3p C P s P /?a 

= < = 2 — fJ , 

Cqpt s Pp s P a 

because the second quotient is maximal if p is minimal. □ 



5 Randomized Online Algorithms 

We define R-SUM (1l-0SUM^ as a randomized variant of SUM (OSUM) which, with 
probability q = buys a Bahncard at time t iff SUM (OSUM) would buy one 

at time t. It is easy to see from the proof of the next theorem that is the 
optimal choice for this probability. 

Theorem 6 . R-SUM and R-OSUM are -^^-competitive for BP(C, /?, T), 

Proof, (Sketch) We only show the theorem for R-SUM. We use the same notation 
as in the proof of Theorem 4. Let / = /i U /2 U /a be an expensive phase. Then 

Cr-sum ^ qC P S 2 P {q/3 P 1 — q) • (si P S 3 ) 

and 



Cqpt — C P + S 2 + S 3 ) . 

Hence, ^ < , because the first quotient is maximal if S 2 is maximal, 

Cqpt 1 + P 

and with S 2 = c^crit and q = it is constant □ 

Note that < 2 — /? if /? G (0,1), so R-SUM usually beats SUM. It is |- 
competitive for the GBP, but for the SRP it is identical to the deterministic SUM 
algorithm. 

We now consider the case that T — cx>, i.e., a Bahncard never expires. This 
makes the problem more similar to the well-understood SRP. In this case, time 
is no longer important, and we can w.l.o.g. assume that the behaviour of an 
algorithm at any moment is completely determined by the sum of all previous 
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requests. A deterministic algorithm A can thus be described by a single posi- 
tive number Sa, meaning that A buys a Bahncard if the cost has reached Sa- 
A randomized algorithm Q can be described by a monotone increasing function 
Pq : [0,cx)] — [0,1], where Pq{s) is the probability that Q has a Bahncard after 
the cost has reached s. Since small requests work in favour of the adversary, we 
can further assume w.lo.g. that the total ticket cost is a continous function of 
time (and monotone increasing, of course). Then, a request sequence is also just 
a positive number s, namely the sum of all requests, and the expected cost of Q 
on s is 

cq(s) = Pq{s) • C + s - (1 - /?) • / Pq{x)cIx . 

Jo 

We now define the randomized algorithm RAND by 

. „ ^ ^crit 

it 

Theorem 7. RAND is competitive for BP{C^ jS^oo), 

Proof, (Sketch) Let s be a request sequence. If s < Ccrit then Cqpt( 5) = s and 
Crand(s) = 5 • If s > Ccrit then Cqpt(5) = C P j3s and 

CrANd('^) — CRAND(CcWi) T (PrAND (^ crii ) ' /? T 1 PrAND (^ crii)) ' ('^ ^crit) 

= ^ ■ e-l+/? + ■ e-l+/3 • ^ 

Note that RAND always beats R-SUM or R-OSUM. If /? = 0 then the Bahncard 
Problem becomes the SRP, and RAND behaves like the optimal ^^-competitive 
randomized Ski- Rental algorithm with N = oo [1,8]. 

Theorem 8. No randomized online algorithm for RP(C, /?, T) can he better than 
- competitive. 

Proof, (Sketch) We first prove the theorem for BP(C, /?, cx)). If the request 
sequence s is cheap, i.e., s < Ccrit^ then the expected cost of a randomized 
algorithm Q is small if Pq(-s^) is small for < s. On the other hand, if s > Ccrit 
then Q comes out better if PQ{ccrit) is high. In R-SUM, we chose the extreme 
approach : We do not buy unless the cost reaches Ccrit^ However, distributing 
the probability of buying at s = Ccrit over the interval [0, Ccrit] reduces the cost 
of expensive request sequences while raising the cost of cheap ones. One can 
show that the optimal probability distribution is Prand which yields the same 
competitive ratio of on all request sequences. 

Since we can easily transform the requests sequences used in the proof above 
into sequences within time interval [0,7’), the theorem follows. □ 



Prand (-5) 



e— 1-|-/5 
e— 1 
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To generalize RAND to arbitrary we define another randomized algorithm 
RAND2 : For 0 < 7 < Ccrit: let 7 — SUM be the deterministic algorithm which buys 
a Bahncard at a regular request (t,p) iff ^ 7 1® Ccr^i-SUM). 

RAND2 chooses 7 G [0, Ccrit] randomly such that the probability of 7 G [0, s] is 
PRAm{s)y for s G [O^Ccritl^ if ^ = 00 then RAND2 is identical to RAND and hence 
optimal. 

Conjecture 9. RAND2 is optimally competitive for BP{C, f3/T). □ 

Acknowledgements. We want to thank Kurt Mehlhorn for his comments on 
a preliminary version of this paper. 
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Abstract. We consider the problem of searching on m current rays for 
a target of unknown location. If no upper bound on the distance to 
the target is known in advance, then the optimal competitive ratio is 
1 + 2m^/(m — 1)^~^. We show that if an upper bound of D on the 
distance to the target is known in advance, then the competitive ratio 
of any search strategy is at least 1 + 2m^/(m — 1)^“^ — 0(1/ log^ D) 
which is also optimal — but in a stricter sense. 

We also construct a search strategy that achieves this ratio. Astonish- 
ingly, our strategy works equally well for the unbounded case, that is, 
if the target is found at distance D from the starting point, then the 
competitive ratio is 1 -h l[jn — 1)^“^ — 0(1/ log^ O) and it is not 
necessary for our strategy to know an upper bound on D in advance. 

1 Introduction 

Searching for a target is an important and well studied problem in robotics. In 
many realistic situations the robot does not possess complete knowledge about 
its environment, for instance, the robot may not have a map of its surroundings, 
or the location of the target may be unknown [3,6,10,14,17]. Since the robot has 
to make decisions about the search based only on the part of its environment 
that it has explored before, the search of the robot can be viewed as an on-line 
problem. One way to judge the performance of an on-line search strategy is to 
compare the distance traveled by the robot to the length of the shortest path 
from its starting point s to the target t. The ratio of the distance traveled by 
the robot to the optimal distance from s to t over all possible locations of the 
target is called the competitive ratio of the search strategy [18]. 

We are interested in obtaining upper and lower bounds on the competitive 
ratio of searching on rn concurrent rays. Here, a point robot is imagined to stand 
at the origin of rn rays and one of the rays contains the target t whose distance 
to the origin is unknown. The robot can only detect t if it stands on top of 
it. It can be shown that an optimal strategy visits the rays in cyclic order and 
increases the step length each time by a factor of m/(m— 1) starting with a step 
length of 1 [1,4]. The competitive ratio C^ri achieved by this strategy is given by 
1 + 2rn^ l{rn — 1)^“^. If randomization is used, the optimal competitive ratio is 
given by the minimum of the function 1 + 2a^/((a — 1) In a), for a > 1 [4,9,8]. 

Searching on rn rays has proven to be a very useful tool for searching in a 
number of classes of simple polygons, such as star-shaped polygons [16], gener- 
alized streets [3,15], HV-streets [2], and 6^-streets [2,5]. 

This research is supported by the DFG-Project “Diskrete Probleme”, No. Ot 64/8-1. 
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However, the proof of optimality for the above m-way ray searching strategy 
relies on the unboundedness of the rays, that is, on the fact that the target can 
be placed arbitrarily far away from the starting point of the rays [1,4]. But, if 
we consider polygons and the robot is equipped with a range finder, then it is 
possible to obtain an upper bound D on the distance to the target. In this case it 
is implicitly assumed that the strategy for searching on m-rays remains optimal 
though no proof of this assumption has been presented yet [2,3,15]. 

In this paper we provide the first lower bound proof for searching on m 
bounded rays; more precisely, we investigate the question if the knowledge of an 
upper bound on the distance to the target provides an advantage to the robot. 

Let be the optimal competitive ratio to search on m rays where the 
distance to the target is at most D. As mentioned above it is assumed that 
approaches Cm as D goes to infinity; yet, there is only a proof for the case 
m = 2 by Lopez-Ortiz who shows that 9 — 0(1/ log 12) is a lower bound for 
the competitive ratio of searching on two rays [13]. In a similar vein, Icking 
et al. investigate the maximal reach of a strategy to search on the line if the 
competitive ratio of the strategy is given [7]. The reach of a strategy X is the 
maximum distance D such that a target placed at a distance D to the origin 
is still detected by a robot using X if the competitive ratio C is given. Icking 
et al. derive a recurrence equation for the optimal reach which implies that the 
reach is continuous and strictly monotone in D [7]. This in turn implies that C 2 
is strictly monotone in D and assumes all values in the interval [3,9]. 

In this paper we prove that 

l + 2m™/(m- -0(l/log2/;) (1) 

is a lower bound on (7^, for general m; this also improves Lopez-Ortiz’ bound for 
m = 2. Moreover, we present a strategy that achieves a competitive ratio of the 
same form as Equation 1, albeit with a different constant factor in the “big-Oh” 
term. Here, D is the distance at which the target is discovered. Astonishingly, 
our strategy achieves this competitive ratio without knowing an upper bound 
on D in advance. These two results imply that knowing an upper bound on the 
distance in advance does not improve the competitive ratio significantly. Note 
that all previously known strategies have a competitive ratio of 1 + 2m^/(m — 
l)^“i — 0(1/ D) if the target is detected at distance D. 

The paper is organized as follows. In the next section we present some def- 
initions concerning searching on rn rays. In Section 3 we show that an optimal 
strategy to search on rn bounded rays is periodic and monotone. In Section 4 
we first consider searching on two rays to introduce our approach to analysing 
the competitive ratio of an optimal strategy. In Section 5 we generalize our 
ideas of the case of searching on two rays to rn rays. Finally, in Section 6 we 
present a strategy whose competitive ratio converges asymptotically as fast to 

1 T 2m^/(m — as the lower bound which we have shown before. 

Due to the limited space all the proofs are omitted. 

2 Definitions 

Let A be a strategy to search on rn rays. We model A as a sequence of positive 
real numbers, that is, A = {xq^x\^X 2 ^ . . .) with Xk > 0, for oil 0 < k < oo. We 
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illustrate this for the case of a point robot searching on the real line, that is, 
m = 2. 

In the beginning the position of the robot is a point s on the real line; it 
has to find a target t that is located somewhere to its left or right. It can only 
detect t if it stands on top of it. The robot starts at the origin s and travels for 
a distance of xq to one side, say to the left. The robot returns to s, travels a 
distance of Xi to the right, returns, and so on. Obviously, the values which 
denote the distance that the robot travels to the left or to the right of s suffice 
to characterize a search strategy completely. 

The Competitive Ratio Assume that the target is discovered in Step + 2, 
say to the left of the origin. Clearly, the ray to the left of the origin was visited 
the last time before Step k -\- 2 m Step k. Hence, the distance d to the target is 
greater than Xk . The distance traveled by the robot to discover t is d-\~2 Xi, 

Since obviously d > Xk and the target can be placed arbitrarily close to Xk by an 
adversary, the highest lower bound on the competitive ratio of Step k is given 
by the expression 

k-\-l k-\-l k-\-l 

sup (d T 2 Xi)/d = sup 1 T 2 Xifd = 1 T 2 Xi /xk . 

d>Xk d>Xk 

Note that the above expression depends only on elements of X. For general rn 
the competitive ratio is given by 1 + 2 ^ /^k if the rays are visited 

cyclically. 

The first step is a special case that we have not considered yet. If no infor- 
mation about the target is available, then one false move in the beginning may 
lead to an arbitrarily large competitive ratio. In order to avoid this problem we 
assume that a lower bound of one for the distance to the target t is known in 
advance. 

3 Searching on m Rays 

We are interested in the case that an upper bound on the maximum distance 
of the target to the origin is known. We now model a strategy X as a finite 
sequence of positive numbers, that is, X = (xq, . . . , x^), for some n > 0. 

3.1 Periodicity 

In order to prove a lower bound on the competitive ratio, we first prove some 
properties of optimal strategies, that is, strategies with minimal competitive 
ratio. If we denote the ray that the robot visits in Step k by r/^, then a strategy 
is periodic if Vk+m = for oil 0 < k < n — ni. A strategy is monotone if 
^^+1 A Xi , for all 0 < i < n — 1 . We can show that there is an optimal strategy 
that is periodic and monotone. 

Lemma 1. There is an optimal strategy that is monotone and periodic up to 
the last step. 

By Lemma 1 it suffices to consider monotone and periodic strategies in the 
following. Note that if X is monotone, then the last rn steps of A all have length 
12, that is, there is an optimal strategy with x^-m+i = • • • x^ = 12. 
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3.2 A Recurrence Equation 

In the following we assume that A is an optimal periodic strategy. The compet- 
itive ratio of X in Step k is given by 1 + 2Fk{X) where 

k-\-m— 1 

Fk{X) = Xi/xk, 

i=0 

for k = 0, . . . ,n — m. Note that the robot visits ray ri in Step n — m -\- 1; 
hence, = F and Fn-m+i{X) is not relevant for the computation of the 

competitive ratio. Let cx = maxo<^<ri-m If can be shown that if X is 

an optimal strategy, then the values of Fk{X) are all the same, for 0 < k < 

n — rn [12]. 

Lemma 2. If X is an optimal strategy, then Fk{X) = cx, for a//0<fc<n— m. 

Note that if X is an optimal strategy, then 1 + 2cx = Lemma 2 implies 
that if X is an optimal strategy, then 

Xk+m-l - C^Xk + C^Xk-1 = 0, (2) 

for all 1 < A: < n — m, where = {C^ — l)/2. Equation 2 completely defines 
the sequence X = (xq, xi, . . . , x^) if we are given the values xq, . . . , Xm,-i (which 
we do not know); however, we know the values of x^_^^+i, . . . ,x^ since Xi = D 
in the last rn steps. Unfortunately, the value of x^ is irrelevant since x^ does not 
appear in Equation system (2). Instead, the m-th boundary value of Equation 2 
is given by x_i = 1 as this is the minimal distance to the target. 

In order to obtain rn consecutive boundary values, consider x^-m- It is easy to 
see that Xn-m > D/2e. If we now require that Xi — D /2e, for i — n—rn, . . . , n— 1, 
then it can be shown that the competitive ratio of an optimal strategy that 
satisfies these equations is less than the competitive ratio of an optimal strategy 
for the original problem. For convenience, we neglect the division by 2e in the 
following as this only influences the constant of the “big-Oh” term. 

In order to make use of this information we consider the sequence Y of the 
values of X in reverse order, that is, pi = x^_^_i, for i = 0, . . . , n. For simplicity 
we write c instead of in the following. The values pi satisfy the following 
recurrence 



yk-\-m yk-\-m—l T yk / 0 — 0, 



(3) 



for 0 < A; < n — m. 

The initial steps again have to be considered separately. The worst case 
competitive ratio the first time the mth ray is visited is 1 + 2 ^^q^x^ which 
implies that 1 + 2 x^ < 1 + 2c and, hence, the value of XlILn-m+i 

at most c. In addition, we note that all the values yo^ • • • ^yn have to be positive. 
We assume in the following that Y is given by Equation 3 which defines an 
infinite sequence some of whose elements may be negative. 

In order to prove a lower bound on the competitive ratio 1 + 2c we show the 
following theorem. 
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Theorem 1. If c < ra^ j{ra — 1)^ ^ — 0(1/ log^ D\ then there is no sequenee 
Y and no n > 0 sueh that Y satisfies Equation 3, Xir=?i-m+i ^ yo = Z/i = 
— Um—i — Y) y and yo? • • • ^ Vn ^ 0* 

By the construction of Y we also obtain that there is no strategy X with a 
competitive ratio of 1 + 2c to search on m rays in the interval [1,0]. 

Lemma 3. If there is no sequenee Y and no n>0 sueh that Y satisfies Equa- 
tion 3, y”=?-m+i J/i < c and yo = Vi = ■ ■ ■ = Vm-i = D, and y™, • • • , > 0, 

then there is no strategy X with a eompetitive ratio o/ 1 + 2c that searehes on m 
rays for a target of distanee at most D to the origin. 

3.3 The Characteristic Equation 

We only consider the sequence Y in the following. Equation 3 has the charac- 
teristic equation 

A^_ A^-i + l/c=0 or c= l/(A^-\l - A)). (4) 

We first note that since A^“^(l — A) < 0, for A > 1, there is no positive real root 
larger than one. On the other hand, if we set /x = 1 /A, then c = /x^/(/x— 1) and if 
there is a positive real root A of Equation 4 with A < 1, then c > inf^>i fjX /{ja — 
1) = ra^ ! {ra — 1)^“^ and we are done. This implies that we can assume in the 
following that there is no positive real root of Equation 4. 

So we investigate the complex and negative roots of Equation 4 in more 
detail. 

4 Solving the Recurrence Equation for m = 2 

In order to illustrate our approach we present the case m = 2 in greater detail. 
In the following we assume that 3 < c < m^/(m — l)^~i = 4. It is easy to see 
that if c < 3, then D is bounded by a constant. 

4.1 An Explicit Solution 

Eor m = 2 Equation 4 reduces to A^ — A T 1 /c = 0 with the solutions 

A = 1 /2 ^1 T ^ \/(4 — c) /cj and A = 1/2 ^1 — i^/ (4 — c) /cj . 

Here, A denotes the conjugate of A. Hence, the solution to Equation 3 in the case 
m = 2 is given by 

Vk = aA* + aA* = 2Re{a\^) 

where Re denotes the real part of a complex number, a and a are given by 



a = D/2 ^1 — i\/c/(4 — c)j 



and a = D/2 ^1 T i\/c/(4 — c) j . 
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4.2 Polsir Coordinates 

If we consider the polar-coordinates of A and A, that is, A = and A = 

then p — \/l/c and Lp = arctan( ^(4 — c) /c) . Similarly, for a — and 
a = we obtain a = D/\/4: — c and 0 = — arctan(^c/(4 — c)). Hence, 



Vk 



aA* + aA* 



2D 

yc*(4- c) 



cos 



arc tan 




— arctan 




If we visualize the above equation in the complex plane, then yk is the projec- 
tion of the vector of 2a A^ onto the x-axis. If we multiply two complex numbers, 
then the radii are multiplied and the angles are added. Hence, the sequence 2a A^ 
turns by an angle of p towards the second quadrant with each iteration. Once 
2aA^ is in the second quadrant, 2Re[aX^) is negative. Since all elements pi have 
to be positive, the first index k with yk-\-i < 0 is the maximum length of y (and 
A). This idea was used before to prove that there is no strategy to search on the 
(unbounded) line with a competitive ratio of less than nine since a strategy to 
search on the real line cannot be of finite length [1,5,7,11]. yk becomes negative 

as soon as 

/4 - c I c 

A: arctan W arctan W g (tt/2, 37t/2). 

We show that D can be chosen large enough such that either < 0 and 
yn > c or yn-i/yn > c. Our previous considerations imply that in both cases 
there is no strategy to search on the real line for a target at a distance at most 
D with a competitive ratio of 1 + 2c. 

Of course, we are interested in the smallest D for which the above inequalities 
holds. 

Let no be the first index such that < 0, that is. 



cos 



^no arctan 



— arctan 




< 0 or no 




Some rough estimates show that no < 9 /a/ 4 — c and yno -2 > D/VcWT^c 
With these two inequalities we can show the following result. 

Lemma 4. If 3<c<4 — 81/ log^(i4/16); then D/V c?l V^-c ^ 

Let 3<c<4 — 81/ log^(i4/16). Lemma 4 implies that ynQ -2 > and < 0 . 
Hence, if y^^-i < c, then [y^o-i yno- 2 ) /Vno-i > c; otherwise yno-i > c. This 
proves Theorem 1 for the case m = 2. 

5 Solving the Recurrence Equation for the General Case 

In the following we sketch our approach for general m. As for the case m = 2 we 
want to show that if there are only complex or negative roots of Equation 4, then 
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the polar angle of the roots turns towards to the second quadrant. However, the 
details are much more involved than in the case m = 2 since we have many roots 
of Equation 4 and the roots cannot be computed explicitly. One possibility to 
get around this problem is to use estimates on the angles and radii of the roots. 

If A is the root with the largest radius among all roots of Equation 4, then 
after a sufficiently large number of steps the contribution of A dominates the 
contribution of all other solutions and only the angle of A^ determines whether 
the solution is positive or negative. Since the number of steps increases logarith- 
mically with D, a large enough D yields then a negative sequence element. 

Let Aq, . . . , A^^_i be the roots of Equation 4. The solution of the recurrence 
is given by 

Vk = <3,qAq T Q-qAq T • • • T 

We first investigate the structure of the roots A^,0<i<m— 1. 

Let A be a complex root of Equation 4. We consider the polar coordinates of 
A, that is, we set A = We can show the following relationship between p 

and ip. 

Lemma 5. //A = is a complex root of Equation 4, then p = sin(m — 

l)(p/sinm(p and A^“^(A — 1) = p^“^(pcos(m(p) — cos(m — l)cp). 

The Polar Angle of a Root We first concentrate on the polar angle of a root 
A of Equation 4. 

Lemma 6. If X = pe^t^ is a complex root of Equation 4, then cp G [2/^7r/(m — 
1), (flk + l)7r/m]; for some 0 < k < [m/2j . 

We now can show that there is exactly one root Xk for each interval [2k7v/{m— 
1 ) , (2/^ + 1 )7T /m] with 0 < k < [m /2J . 

Lemma 7. Eor each interval [2/^7r/(m — 1), {2k + l)7r/m] with 0 < k < [m/2j; 
there is exactly one root Xk = PkE^"^^ of Equation 4 with pk ^ [2k7v / {m— 1) , {2k-\- 
1)71 /ni\. 

The above roots account for [m/2j roots of Equation 4. If m is odd, then there 
is one root X^^/2] with pirn/2] = ^ [^ri/2j 7v/{m — 1) = (2 [m/2j + l)7r/m = tt, 
that is, X^jn/ 2 \ is a negative real root. The remaining [m/2j roots are given by 
the conjugates Xk — of Xk as in the case m = 2. 

If c < j{m — 1)^“^, then cpo is bounded from below as follows. 

Lemma 8. 

^ . f 1 I 1 1 

The Radius of a Root We now consider the radius of a root of Equation 4. 
We can show that Aq is the root with the largest radius. A4ore precisely, we have 
the following lemma. 

Lemma 9. po > 1/3 and po/pk > 1 + l/(4m^), for all 1 < k < |~m/2]. 
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The Coefficients We finally give upper and lower bounds on the radii of the 
coefficients. 

Lemma 10. |a^/ao| < and |ao| > 



Putting it all Together We now put the estimates we obtained for the radii 
and the angles of the roots of Equation 4 as well as the coefficients into use. 

Lemma 11. For all k > 1, 

q2m^m+l \ / 



2|ao|pQ cos(q;/,) 



<Vk <2|ao|po cos(o/,) 



(1 + l/(4m^))^ 
where ak = Oq -\- k(fo. 

We claim that if 

c < rn^/{m — 1)^“^ — 22^m® log^ rn/ log^ 12, 



1 + 1 /( 4 m^))^ J 



(5) 



then there is a Step k such that yk > (? and yu +2 < 0- As in the case rn = 2 
this proves Theorem 1. 



In the following let s: = ^^rn'^ /{rn — l)^-i — c. We assume that e < 1. In 
the case s' > 1 it is easy to see that D can be only a constant. Let ko be the first 
index greater than 4m^(3mlogm — loge) + 1 such that 

cos(6^o + ^ 0 + 0 ) > 0 £^nd cos(6^o + (^o + l)+o) ^ 0- 

We can show the following bounds on yko-i and yko-\- 2 ' 

Lemma 12. yk^-i > 2|ao|+^+ and y*o +2 < -2|ao|+'^^^- 

We now bound the value of ko. Since the distance between two consecutive 
transitions from positive to negative values of cosine is at most 2tv and ko > 
4m^(3m log rn — log s) + 1 , we have that ko — 4m^(3m log rn — log s) — 1 < 2ti / ipo 
and by Lemma 8 

27t 27rm^^^ 

ko < 4m^(3mlog m — logs) + 1 H < 4m^(3m logm — logs) + 1 H (6) 

+0 

With the above preparations our main lemma can be shown. 

Lemma 13. If c satisfies Inequality 5, then yko-i > ctnd yu^ +2 < 0. 

Since yko +2 < 0, the last step of the strategy is Step /^o + 1- If m > 4, then the 
sum includes yk^-i and, hence, is larger than c which proves 

Theorem 1. If m = 3 and X*+o4i-3+2 Vi = Vko+Vko+i < c, then yko-i/Vko > c 
and as in Section 4 we see that this also contradicts the existence of a strategy 
with a competitive ratio of 1 + 2c. If we recall that we have neglected the division 
of D by 2e, then we obtain the following theorem. 

Theorem 2. There is no seareh strategy for a target on rn rays whieh is eon- 
tained in the interval [1 , D] with a eompetitive ratio of less than 

22‘^m^ log^ m' 



1 + 2 



m 



[rn — 1) 



m— 1 



\og\D/2e) 
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6 An Optimal Strategy 



After having proven a lower bound for searching on rn rays with an upper bound 
on the target distance, one of the questions that remains is whether there actually 
is an optimal strategy that achieves a competitive ratio of 1 + 2m^/(m— 1)^“^ — 
0(1/ log^ D) and what it looks like. In this section we present a strategy to search 
on rn rays that achieves the optimal competitive ratio even if the maximum 
distance D of the target to the starting point is unknown, that is, being told an 
upper bound on the distance to the target is not a big advantage — even if we 
consider the convergence rate of the competitive ratio to 1 + 2m^/(m — 
as D increases. 

The strategy X = (^ 1 ,^ 2 , •••)^ achieves a competitive ratio of 1 + 

2m^/(m— — 0(1/ log^ D) is given by Xi = y/l T i/^n l))b The 

competitive ratio Cx of Strategy X in Step k ni is bounded by 1 -\-2ck where 
Ck is given by 



E A:+m — 1 

i=i 







1 



+ 



A 

m 




m— 1 



E 




j 

k rn 



m 

m — 1 



3 

+ 



k-1 

E 

i=i 



j ^ rn f rn — 1 



k ^ m \ m 



k-j 



where we assume k > 1. If we use the Taylor-expansion of \/l T then the first 
sum is bounded by 

r ~j / m V ^ 1 (m — l)m 

V k -\- m \m — 1 / ~ {m — l)^-i 2 k -\- m ' 

Now we bound the second sum this time using the Taylor expansion of ^1 — x 



k-i 

E 

.7 = 1 



j -\- rn [ rn — 1 



k ^ m \ m 



k-j 



< m — 1 — m 



m — 1 /m(m — 1) — (/^ — m — l)m 



m 



k -\- m 



[k T kn) 



m 



1 / m(m — l)(2m — 1) /m— 1\^/^^T 2k(m — 2) + 2m^ — 3m T 1 

' — — m ' ‘ 



[k T kn) 



Hence, 



spk+m-l /| I i_ / m 
^j=l Y ' m ym—1 



3 



3 





< 



m 



m 



{rn - 1)^-1 



1 rn{rn — l)(2m — 1) 

8 (/^ T knY 



For convenience we start with x\ instead of xq. 



1 
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Finally, we relate the number of steps k-\- m to the distance D to the target. 
If the target is detected in Step k-\-m^ then the distance to s is in the interval 

- l))t \/l + ^(m/(m - 1))*+™] and k < iog(i-|.T/y_i)) < 
(m — 1) log D. Hence, 



^ 2m — 1 

- (m- l)™-i “ 81og2(3a)' 

We have shown the following theorem. 

Theorem 3. There is a strategy X that aehieves a eompetitive ratio of 

2m — 1 

1 4 “ 2 - — — — 0 

41og^(3D) 

if the target is plaeed at distanee D > I to s. 

By Theorem 2 the strategy we have presented above is optimal. 



k-\-m— 1 

i=i 



\A+ 



k - 



m 



2m — 1 



1 )^ 



8 ( log D 



7 Conclusions 

We present a lower bound for the problem of searching on m concurrent rays if 
an upper bound D on the maximal distance to the target is given. We show that 
in this case the competitive ratio of a search strategy is at least 1 + 2m^/(m — 
l)^“i _ 0(1/ log^ D). Our approach is based on deriving a recursive equation for 
the step length in each iteration of an optimal strategy. The recursive equation 
gives rise to a characteristic equation whose roots determine the properties of a 
strategy. By computing upper and lower bounds on the radii and polar angles 
of the roots we can show that the competitive ratio has to be sufficiently large 
if the target is far away. 

We also present a strategy which achieves a competitive ratio of l+2m^/(m— 
1)^“^ — 0(1/ log^ D) if the target is detected at distance D. The strategy does 
not need to know an upper bound on D in advance. Hence, the knowledge of an 
upper bound on the distance to the target only provides a marginal advantage 
to the robot — even the convergence rate is not improved. 

An interesting open problem is to prove similar results for randomized strate- 
gies. One of the problems with randomized strategies is that there is no published 
proof that there is an optimal periodic strategy. It seems that this is a necessary 
step before the bounded distance problem can be attacked. 
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Transformation and Rotation Transformation 
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Abstract. Approximation algorithms are developed for the diagonal- 
flip transformation of convex polygon tri angulations and equivalently 
rotation transformation of binary trees. For two arbitrary tri angulations 
in which each vertex is an end of at most d diagonals. Algorithm A has 
the approximation ratio 2 — tri angulations containing 

no internal triangles. Algorithm B has the approximation ratio 1.97. 
Two self-interesting lower bounds on the diagonal-flip distance are also 
established in the analyses of these two algorithms. 



1 Introduction 

A rotation in a binary tree is a local restructuring of the tree that changes the 
position of a node and one of its children while the symmetric order in the tree 
is preserved. Such an operation has found its applications in many aspects. In 
data structures, rotations are the primitive used by most schemes that maintain 
“balance” in binary trees [13,18]. In graphics, morphing polygons is abstracted 
as rotations on weighted binary trees [6,8]. The rotation operation is also of 
interest from a purely mathematical point of view [11]. Further, a similar but 
slightly powerful operation named nearest neighbor interchange is used exten- 
sively for defining the dissimilarity between phylogenies and for heuristical search 
of optimal phylogenies in biology [2,17]. 

The rotation operation on binary trees is equivalent to the diagonal-flip op- 
eration in triangulations of a convex polygon. But, the later is more intuitive(for 
example, see [15]). A diagonal-flip is an operation that converts one triangula- 
tion of a polygon into another by removing a diagonal in the triangulation and 
adding the diagonal that subdivides the resulting quadrilateral in the opposite 
way. The diagonal- flip operation was early studied by Wagner [19] in the con- 
text of arbitrary triangulated planar graphs and by Dewdney [5] in the case of 
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graphs of genus one. They showed that any such graph can be transformed to 
any other by diagonal-flips. However, they did not try to accurately estimate 
how many flips are necessary. In [16], Sleator, Tarjan and Thurston proved that 
i?(nlogn) diagonal-flips are necessary and enough for transforming a numbered 
triangulated planar graph into another. In another paper [15], by using hyper- 
bolic geometry, Sleator, Tarjan and Thurston showed beautifully that 2n — 10 
diagonal flips are enough and necessary for transforming one triangulation of 
the n-gon into another when n is large, which improved an earlier work of Culik 
and Wood [3]. Since then the diagonal-flip transformation has been studied in 
several aspects [4,7,9]. 

Like [15], this paper works on diagonal-flips in triangulations rather than 
rotations on binary trees. It is open whether a shortest diagonal-flip transforma- 
tion between two triangulations of a convex polygon is computable in polynomial 
time [14]. Our interest is to develop an approximation algorithm with ratio bet- 
ter than 2 for the diagonal-flip transformation, which is a hard problem raised 
in [14] and explicitly in [2] recently. Although there is a trivial approximation 
with ratio 2, any better approximation turns out to be very difficult. In this 
paper, we present an approximation algorithm that has better approximation 
ratio for triangulations of a convex polygon in which each vertex is an end of 
constantly many diagonals. We also study the diagonal-flip transformation for a 
special class of triangulations. In a triangulation of a convex polygon, a triangle 
is said to be internal if it contains three diagonals. The class of triangulation 
without internal triangles contains most of interesting triangulations studied in 
literature(for example, see [11,7]). In fact, triangulations in this class correspond 
one-to-one to binary trees without degree-3 nodes. For such a class of trian- 
gulations, we presents a polynomial algorithm with approximation ratio 1.97. 
The ratio can further be reduced by a sophisticated argument similar to our ap- 
proach. However, we are not intent on giving the best possible ratio in this short 
abstract. The complete proofs of these results can be found in the full version of 
this paper. 

The rest of this extended abstract is divided into six sections. Section 2 intro- 
duces briefly the rotation on binary trees and the diagonal-flip in triangulations 
and shows the equivalence of these two operations. Section 3 gives two transfor- 
mation primitives. Using Proposition 3 in this section, we are also able to answer 
negatively a problem posed by Knuth [11], who suspected that two specific tri- 
angulations without internal triangles has a large diagonal-flip distance. Section 
4 presents a polynomial algorithm (Algorithm A) that has the approximation 
ratio 2 — for triangulations in which each vertex is an end of at 

most d diagonals. Section 5 presents a polynomial algorithm (Algorithm B) with 
approximation ratio 1.97 for triangulations without any internal triangles. 

2 Definitions 

2.1 Binary tree rotations 

A binary tree is a collection of nodes and three relations among these nodes: 
parent, left child and right child. A special node is called the root. Every other 
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node has a parent and may have a left and/or right child. All nodes without any 
child are leaves. A binary tree has size n if it contains n nodes. (See [10] for a 
more complete description of binary trees and tree terminology.) 

A rotation is an operation that changes one binary tree into another with the 
same size. Figure 1 shows the general rotation rule. In a tree of size n, there are 
n — 1 possible rotations, each corresponding a non- root node. A rotation main- 
tains the symmetric order of the nodes. Furthermore, a rotation is an invertible 
operation. We define rotation distance between two trees as the minimum num- 
ber of rotations required to convert one tree into the other. The rotation distance 
between two binary trees of size n is at most 2n — 6 [15]. In [12], Pallo proposed 
a heuristic search algorithm for computing the rotation distance. 




Fig. 1. The definition of rotation. 



2.2 Diagonal-flips in triangulations 

Binary tree rotation can be formulated with respect to different systems of 
combinatorial objects and their transformations. The diagonal-flip operation in 
triangulations is perhaps more intuitive and so supplies more insight. Consider 
the standard convex (n + 2)-gon. We choose an edge of the polygon as a dis- 
tinguished edge, called “root edge” , and label its ends as 0 and n T 1 • We also 
label the other n vertices from 1 to n counterclockwise. Any triangulation of the 
(n + 2)-gon has n triangles and n — 1 diagonals. From a triangulation of the 
(n + 2)-gon, we derive a binary tree of size n by assigning a node for each trian- 
gle and connecting two nodes if the corresponding triangles sharing a common 
diagonal. The root of the tree corresponds to the triangle containing the root 
edge. It is not difficult to see that the ith node of the binary tree in symmetric 
order corresponds to the triangle with vertices i, j and k such that j < i < k. 
In this way, we obtain a 1-1 correspondence between n-node binary trees and 
triangulations of the (n + 2)-gon as illustrated in Figure 2. 

A diagonal- flip is an operation that transforms one triangulation of a polygon 
into another as showed in Figure 3. The diagonal- flip distance between two trian- 
gulations 7Ti and 7T2 of a polygon is the minimum number of diagonal-flips needed 
to convert one triangulation into the other, which is denoted by /d(7Ti, 7T2). Note 
that /d(7Ti, 7T2) < 2n — 10 for any two triangulations tti and 7T2 of the n-gon [15]. 
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Fig. 2. A binary tree and its corresponding triangulation. 




Fig. 3. A diagonal flip in a triangulation of the hexagon. 



Obviously, diagonal-flips in a triangulation correspond one-to-one to rota- 
tions in the corresponding binary tree. Other interesting relationship between 
a triangulation of a convex polygon and its corresponding binary tree can be 
found in a nice survey article [1]. 

3 Diagonal-flip transformation primitives 

3.1 Difference Graphs 

Given two triangulations tti and tv 2 of the n-gon, we define their difference 
graph G = G(7ri,7T2) as the union of these two triangulations. Formally, the 
graph G has vertex set V = {1,2, • • *,n }, and edge set E = + l)|i = 

l,2,***,n— 1}U AVi U AV 2 , where E^. denotes the set of n — 3 internal diagonals 
of 7Vi. 

If an edge shared by both triangulations tti and 7T2, we call it a face edge. 
All the boundary edges In and i(i + l), 1 < i < n— 1, are face edges. Since 
two triangulations may have common diagonals, there are other face edges in 
general. We define a subgraph inside a simple cycle consisting of face edges a 
cell It is not difficult to see that the difference graph G can be decomposed into 
cells, which have disjoint diagonal edges. A difference graph G(7ri,7T2) is simple 
if it has only one cell, i.e. two triangulations tti and 712 dond have any common 
diagonals. In the rest of this section, we will focus on simple difference graphs. 

In a difference graph, an edge is said to be a diagonal edge if it corresponds 
with a diagonal of one triangulation; it is a boundary edge otherwise. For a 
diagonal edge e G A'(G), let G{e) = {E G E{G) \ E intersects e,}, where two 
edges having only a common end are not considered to intersect each other. All 
edges in G[e) are from the other triangulation. The cardinality c(e) of G[e) is 
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called the cross number of e. Obviously, c(e) > 1 for any non-face edge e. We 
say a diagonal e isolated if c(e) = 1. Finally, a vertex v G 0'(7ri,7T2) is pure 
with respect to tti (resp. 7T2) if it is only adjacent to some 7ri(7T2)-diagonals; it 
is mixed otherwise. 



3.2 Transformation primitives 

Recall that /d(7ri,7T2) denotes the diagonal-flip distance between tti and 7T2. 

Proposition 1. ([15]) Letivi and 7T2 he two triangulations of a polygon and let 
e G 7Ti he an isolated diagonal in 0'(7ri,7T2). Let he the triangulation created 
from 7T2 hy flipping the unique edge e^ that intersects e. Then (1) tv 2 has one more 
common diagonal with tti than 712 - Equivalently^ the difference graph 
has one more cell than G [ 711 ^ 772 ); (2) fd{7V\^7V2) — fd{7Vi^7V2) — F 

Next, we study properties relating with the degrees of vertices in difference 
graphs. Let tti and 7T2 be two triangulations of a polygon. For a vertex v G 
0'(7ri,7T2), the number of all diagonal edges adjacent to v is called the degree of 
t;, denoted by d{y). By using Proposition 1, we can prove 

Proposition 2. Let tv\ and tv 2 he two triangulations of a polygon and let v he a 
vertex such that d{v) = 1 in G{tvi^ 7V2). If the unique diagonal e adjacent to v is 
in 7V\ (resp. 772 )^ then flipping e creates a triangulation tv[ (resp. 7 V 2 ) which has 
one more diagonal in common with 772 (resp. 77i) than t7i (resp. 772 )^ 

Proposition 3. Let 77\ and 772 he two triangulations of a polygon and let u and 
V he two adjacent degree-2 vertices in G{77\^772). If one of u and v is pure^ then 
it is possible to create 7t( and hy flipping three of the four diagonals (which 
can he determined easily) adjacent to u or v such that (1) t 7[ and have two 
more common diagonals than tti and 772 ^ and (2) /d(7r(,7T2) < /d(7ri,7T2) — 2. 



4 A better approximation for arbitrary triangulations 

4.1 An approximation algorithm 

In this section, we will present an approximation algorithm for the problem of 
transforming triangulations. Formally, the problem is defined as 
Instance: Two triangulations of a polygon; 

Output: A shortest diagonal-flip transformation between the given triangula- 
tions. 

Our approximation algorithm is described in Table 1. Obviously, the algo- 
rithm runs in polynomial time. Now we give some basic facts which will be used 
in analyzing its approximation ratio. Recall that G{t7i^772) denotes the difference 
graph of 77i and 7T2. 
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Input: Two triangulations tti and 7T2; 

Do until the following df’ conditions fails 
if there are isolated diagonals then 
pick such an edge e; 

let eJ be the unique diagonal that intersects e; 

if E 7Ti then tti := tti + e — else 7T2 := 7T2 + e — e^; 

Enddo 

Let the resulting polygon tri angulations have k cells Pi(i < fc), and let 7Vj\p. denote 
the restriction of tvj on Pi for j = 1, 2 and i < k; assume Pi has Ui vertices. 

For each cell Pi 

pick a node v; 

transform 7Vi\p. into the unique tri angulation tt all of whose diagonals have 
one end at v using at most Ui steps; 
transform tt into tv 2 \ Pi reversely. 

Endfor 



Table 1. Algorithm A, 



Note that Algorithm A has the following properties. 

Proposition 4. (1) Flips in Do loop does not increase the maximum degree of 
the difference graph G'(7ri,7T2). (2) Flips in Do loop will not increase the number 
of internal triangles in G{tv\^tv 2 )^ 

4.2 Analysis of the algorithm 

In this section, we shall analyze the transformation algorithm given above. 
Let 7Ti and 7T2 be two triangulations of the n-gon. Consider a sequence 77 of 
diagonal-flips that transforms tti into 7T2. A diagonal-flip (a6, cd) E 77 is auxiliary 
if cd ^ 712' We also say flip (a6, cd) touches the vertices a, 6, c, d. Let A(77) denote 
the set of all auxiliary diagonal-flips in S. Then, 

\n\>\A{n)\+n-3. (1) 

Inequality 1 implies that any lower bound on the cardinality of A (77) induces 
a lower bound of |77|. In the rest of this section, we will work on |A(77)| for a 
transformation sequence 77 instead of its cardinality |77|. Recall that a vertex v 
is pure with respect to tti if it is only an end of tti - diagonals. 

Proposition 5. Let tvi and tv 2 be two triangulations of a polygon and let v E 
G { 771 ^ 772 ) be a pure vertex with respect to tti. If flipping any diagonal adjacent 
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to V does not create a tv 2 ~ diagonal^ then in any sequence II of diagonal- flips that 
transforms tti into 7T2; there are at least one auxiliary diagonal touching v. 

Proof. We consider the subpolygon consisting of v and its neighbors on bound- 
ary and adjacent nodes. Let v be adjacent to , U 2 , • • • , and let u and w be its 
left and right neighbors on boundary(see Figure 4). Since flipping any diagonal 
vvj does not create a 7T2 -diagonal, k>2. 

Since all diagonals connecting v are in tti , uw E tv 2 and uvi^ViVi^i^VkW E tti. 
Consider the first flip {xy^pq) touching v in the sequence 77. If pq is adjacent to 
the flip is auxiliary because all 7T2-diagonals are not adjacent to v. Otherwise, 
we consider the diagonal adjacent to v that is first flipped away in 77 . Let such 
an edge be vvj. For simplicity, we assume 1 < j < k. By assumption, vvj-i and 
vvj-\-i are tti - diagonals. If the flip [vvj ^Vj-iVj^i) is not auxiliary, then Vj-iVj is 
a 7T2-diagonal and has cross number 1, contradicting to the hypothesis. 




Fig. 4. Vertex V is an end of tv i~ diagonals. 



A vertex v is said to be straddle with respect to tti if for any pair of tti- 
diagonals that are adjacent to i;, there exists a 7T2 -diagonal adjacent to v between 
them. Otherwise, it is non- straddle. By definition, a degree-2 vertex is straddle 
only if it is mixed, i.e., an end of two diagonals from different triangulations. 

Proposition 6. Let tti and tv 2 be two triangulations of a polygon such that 
C'(7ri,7T2) does not contain any isolated edges and let v he a non-straddle vertex 
with respect to iv\. If (1) v is not a vertex of any internal triangles in tv\ or 
^2 7 (2) V is not connected with any vertices of internal triangles in iV 2 y and (3) 
flipping any TV\-diagonal adjacent to v does not create a iV 2 -diagonal^ then in any 
sequence 77 of diagonal-flips that transforms tvi into 7T2; there is at least one 
auxiliary diagonal touching v. 

Recall that a vertex v E G{tvi^7V2) is pure with respect to tti if it is only an 
end of 7Ti -diagonals. Let V {rvi) and F(7T2) denote the set of pure vertices with 
respect to tv\ and tv 2 respectively. 
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Lemma 7. Let tti and tv 2 be two triangulations of the n-gon such that G{tvi^ 7V2) 
does not contain any isolated edges. Then^ the flip distance between tti and tv 2 
is at least n — 3 + , 

Sketch of Proof. By Proposition 5 , fd{ 7 Vi^ 7 V 2 ) > n — 3 + |P(7 Ti)|/4 and 
/d(7ri,7T2) > n - 3 + |y(7T2)|/4. 

Lemma 8. Let tti and tv 2 be two triangulations of the n-gon in which each vertex 
is an end of at most d diagonals. If G{tvi ^ 712 ) does not contain any isolated edges ^ 
then fd{7Vi , 7T2 ) < n — 3 + ~ + . 

Sketch of Proof. It can be proved that a triangle with no boundary edges 
has at least two non-straddle vertices and a triangle with one boundary edge 
has at least one non-straddle vertex. Since there are exactly n — 2 — 2|P(7T2)| 
triangles with one boundary edge and | V (7T2)| triangles with three diagonals and 
the fact any vertex can be a vertex of at most [d—1) triangles with two internal 
diagonals connecting it, there are at least n/[d— 1 ) non-straddle vertices. Since 
7Ti and 7T2 have \ V (7Ti)| — 2 and \ V (7T2)| — 2 internal triangles respectively, there 
are at least n/[d— 1 ) — 3|y(7Ti)| — 3|y(7T2)| non-straddle vertices that are not 
in any internal triangles of tti or 7T2. Further, since each vertex is an end of 
at most d 7T2 -diagonals, there are at least — 3|F(7 Ti)| — 3|F(7T2)| — {d — 
1)|F^(7T2)| non-traddle vertices with respect to tti that satisfy the conditions in 
Proposition 6. Thus, there are at least — G+‘^) \ ^G2) \ g^i^xiliary 

diagonal-flips in any sequence of diagonal-flips that transforms tti to 7T2. Thus, 
/d(7ri,7T2) > n - 3 + 4(^17 ~ Similarly, /d(7ri,7T2) > 

^ ~ ^ ^ Combining these two bounds together, 

we have n T 4^^^^^^ — (^+^)(T(^i) | +T(^2) | ) _ ^ TPig finishes the proof. 

Combining Proposition 1 and 3 , and Lemma 7 and 8, we have 

Theorem 9. On input of two triangulations tvi and 7T2 of the n-gon in which 
each vertex is an end of at most d diagonals^ the transformation algorithm 
outputs a diagonal- flip transformation of length at most 

“ 4 {d-l){d+ 6 ) + l ) /^(^l 7^2)^ 



5 A 1.97-approximation algorithm for triangulations 
without internal triangles 

5.1 An upper bound 

First, we present an upper bound of the flip distance between tti and 7T2 in 
term of the the number of mixed degree-2 vertices in C'(7ri,7T2). 

Theorem 10. Let tti and tv 2 be two triangulations of the n-gon such that they 
do not contain any internal triangles. If G [tv ±^ 772 ) contains rn mixed degree-2 
vertices^ then tvi can be transformed into 712 in at most 5 [n — m) ^ flips. 
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Input: Two tri angulations tti and 7T2 without internal diagonals; 

Do until no isolated diagonals exist in G[ti 1 ^ 112 ) 

pick an isolated edge e and flip the diagonal intersecting it; 

Enddo 

Let the resulting polygon tri angulations have k cells Fi{i < fc), Let Fi have 
Ui vertices. 

For each cell Fi 

if G(7Vi\p.^7V2\Pi) contains more than 0.9285|Pi| mixed, degree-2 vertices, then 
transform 7Vi\p. into 7Vj\p. by Thm 10; 

else 

pick a node v; 

transform 7Vi\p. into the unique tri angulation tt all of whose diagonals 
have one end at v using at most Ui steps; 
transform tt into 7V2\Pi reversely. 

Endfor 

Table 2. Algorithm B. A 1.97- approximation algorithm for triangulations with- 
out internal triangles. 



The upper bound in above theorem can be improved to 5(n — m) + ^ by a more 
complicated technique. Here we are not intent on presenting the best possible 
constant. 



5.2 A 1.97-approximation algorithm 

Proposition 11. Let tti and tv 2 two triangulations of the n-gon that do not 
contain any internal triangles and let the difference graph G [ 711 ^ 712 ) does not 
contain any isolated edges. If there are rn mixed^ degree-2 vertices in G'(7ri,7T2); 
then the diagonal- flip distance fd{7Vi^7V2) is at least n — 3 T . 

Our algorithm is described in Table 4.2. Obviously, it is a polynomial time 
algorithm. Using Proposition 11 and Theorem 10, we analyze its approximation 
ratio in the same way as Theorem 9. 

Theorem 12. On the input of two triangulations tti and 712 of the n-gon that 
do not contain any internal triangles^ Algorithm B outputs a diagonal- flip trans- 
formation of length at most 1.97/d(7Ti, 7T2) . 
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Abstract. We consider the following force field computation problem: 
given a cluster of n particles in 3-dimensional space, compute the force 
exerted on each particle by the other particles. Depending on different 
applications, the pairwise interaction could be either gravitational or 
Lennard- Jones. In both cases, the force between two particles vanishes 
as the distance between them approaches to infinity. Since there are 
n(n— l)/2 pairs, direct method requires 0(n^) time for force-evaluation, 
which is very expensive for astronomical simulations. In 1985 and 1986, 
two famous 0(n\ogn) time hierarchical tree algorithms were published 
by Appel [3] and by Barnes and Hut [4] respectively. In a recent paper, 
we presented a linear time algorithm which builds the oct tree bottom-up 
and showed that Appebs algorithm can be implemented in 0(n) sequen- 
tial time. In this paper, we present an algorithm which computes the 
force field in 6>(logn) time using an processor CREW PRAM. A key 
to this optimal parallel algorithm is replacing a recursive top-down force 
calculation procedure of Appel by an equivalent non-recursive bottom-up 
procedure. Our parallel algorithm also yields a new 0(n) time sequential 
algorithm for force field computation. 

Keywords: Paralle algorithms, spatial tree data structures, force field 
evaluation, N-body simulations, PRAM, cost optimal algorithms. 



1 Introduction and aissumption 

Fast algorithms for force field evaluation have important applications in molec- 
ular conformation, molecular dynamics, and astrophysical simulations. Given a 
cluster of n particles in 3-dimensional space, we need to compute the force ex- 
erted on each particle by the other particles. Since there are n(n — l)/2 pairs, 
direct method requires 0[n^) time for force-evaluation, which is very expensive 
for astronomical simulations. 

In astrophysical simulations, the force exerted on one particle by another 
is given by the gravitational force. In molecular dynamics and molecular con- 
formation, the Lennard- Jones potential is widely used. In both cases, the force 
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DAAH04-9610233. 



Wen-Lian Hsu, Ming- Yang Kao (Eds.): COCOON’98, LNCS 1449, pp. 95-104, 1998. 
(c) Springer- Verlag Berlin Heidelberg 1998 



96 



Guoliang Xue 



exerted on one particle by another particle vanishes as the distance between 
them approaches to infinity. This observation leads to several fast approxima- 
tion algorithms. In 1985 and 1986, two famous 6>(n log n) time hierarchical tree 
algorithms were published by Appel [3] and by Barnes and Hut [4] respectively. 
In 1987, Greengard and Rokhlin [8] published the fast multipole algorithm which 
computes the force field in 0{n) time. Recently, Aluru [1] showed that Green- 
gard ’s algorithm is not 0(n). These algorithms have made great impacts on 
the computational study of molecular conformation/dynamics and astronomical 
simulations. Parallel implementations of these algorithms have been reported 
by many authors, including [7,12,13,14,17]. Due to the big constant in the fast 
multipole algorithm and the simplicity and efficiency of the tree algorithms, hi- 
erarchical tree algorithms received more attention in computational studies [2]. 
Therefore we concentrate on tree algorithms in this paper. 

In a recent paper [16], Xue presented an algorithm which builds an oct- 
tree bottom-up in 0[n) sequential time and showed that Appel’s algorithm can 
be implemented in 0[n) sequential time. That algorithm computes the force 
field top-down using a recursive procedure. It seems difficult to parallelize that 
linear time algorithm efficiently. In this paper, we replace the recursive top- 
down procedure of Appel by an equivalent non-recursive bottom-up procedure 
and present a parallel algorithm which computes force field in 6>(logn) time 
using an processor CREW PRAM. 

The rest of this paper is organized as follows. In section 2, we show that the 
oct-tree can be constructed bottom-up in 6>(logn) time on an processor 
CREW PRAM. In section 3, we show that the force field can be computed in 
6>(logn) time on an processor CREW PRAM. We conclude the paper in 
section 4. Throughout this paper, we make the following assumption on the 
distribution of the particles: 

Assumption 1 . There exist two positive eonstants c\ and C2 sueh that the min- 
imum inter-partiele distanee is at least ci and the maximum inter-partiele dis- 
tanee is smaller than C2n^/^. 

Assumption 1 is highly believed to be true for most applications and is supported 
by many computer simulations. Eor the Lennard-Jones cluster, it is proved that 
the minimum inter-particle distance has positive lower bound which is indepen- 
dent on the number of particles in the cluster [15]. 



2 Building the oct-tree bottom-up in 0(logn) time 

In section 2.1, we will describe the necessary data structure used in our algo- 
rithms. In section 2.2, we will present a 6>(logn) time algorithm for constructing 
the oct-tree using an P = processor CREW PRAM. The time complexity 
of our algorithm is analyzed in section 2.3. To simplify the analysis, we assume 
that both n and logn are powers of 8. It is well-known that this assumption does 
not affect the asymptotic analysis of the algorithm. 
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2.1 Data structures 

A computation box is defined by a point base in 3-dimensional space and a 
positive number size. Let baseX, baseY, baseZ be the coordinates of base. 
The computation box defined by base and size is 

[baseX^ baseX T size) x \baseY^ baseY + size) x [baseZ^ baseZ + size). (1) 

A computation box is illustrated in Figure 1(a). When a computation box is 
partitioned, we obtain 8 non-intersecting computation boxes of equal size whose 
union is the original computation box. An example is illustrated in Figure 1(b). 




(a) level 0 (b) level 1 (c) level 2 



Fig. 1. Computation boxes associated with the first 3 levels of the oct-tree. 



We will make reference to the following data structure during our description 
of the algorithm. Each node in the oct-tree is of type NODE. 

typedef struct _node{ 

struct _node *parent; struct _node *child[8] ; 



int 


isLeaf ; 


int 


weight ; 


int 


pindex; 


double 


coordX; 


double 


coordY ; 


double 


coordZ; 


double 


f orceX; 


double 


f orceY ; 


double 


f orceZ ; 


double 


baseX; 


double 


baseY ; 


double 


baseZ; double size 



}N0DE; 

Note that the partition of a computation box takes constant time. The com- 
putation box in Figure 1(a) is partitioned to 8 smaller computation boxes in 
Figure 1(b), which in turn are partitioned to a total of 64 even smaller compu- 
tation boxes in Figure 1(c). 



2.2 Building the oct-tree bottom-up in O(logn) time 

We assume that the n particles are given in an array of points so that part [i] . x , 
part[i].y, part[i].z represent the coordinates of particle 
i (i = 0 , 1 , 2 , . . . , n — 1). Our PRAM algorithm for oct tree construction is 
presented as Algorithm 2.1. Since F is assumed to be a power of 8, there is an 
integer L such that 8^ = P. To make the description of the algorithm easier, we 
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Algorithm 2.1 (part 1) {Building the oct-tree bottom-up.} 

Step_l {Determine the sizes of the root box and the leaf boxes} 

Using all F processors to compute the following 6 values: 

Xmin := rmrii={)^rL-iVO/rt[i\.x; Xmax := m.8iXi=o^n-i pcirt[i].y; 

Y min := rmni=Q^rL-iVO/rt[i\.z; Ymax := m.8iXi=o^n-ipcirt[i].x; 
Zmin := m.ini=o - 1 part[i].y; Zmax := m.8iXi=o^n-i part[i].z. 

Let S := Let maxlevel be the smallest positive integer such 

that > max{Xmax — Xmin^ Ymax — Y min ^ Zmax — 

Zmin}. Let Z\ := <5 x 
Step_2 {Allocate space} 

We will use a three dimensional array of NODE for the nodes on each 
level of the oct-tree. Let tree[l] be a pointer to the three dimensional 
array of NODE with 2^ x 2^ X 2^ elements (/ = 0, 1, . . . ^maxlevel). 
It is clear that we require to allocate 0(n) space because there are 
8^ tree nodes on level-/ of the tree. These arrays are dynamically 
allocated at this time. 

Step_3 {Construct the leaf nodes} 

for p = 0 to P, processor Pp does the following: 
for t = p to n — 1 step P do 



Let i = 



tree[maxlevel] 
tree[maxlevel] 
tree[maxlevel] 
tree[maxlevel] 
tree[maxlevel] 
tree[maxlevel] 
tree[maxlevel] 
tree[maxlevel] 
tree[maxlevel] 
endfor {t} 
endfor {p} 






part[t] .y — Ymin 



node[i\[j][k\.size := <5; 
node[i\[j][k\.haseX := XminY id; 
node[i\[j][k].baseY := Y min Y jd; 
node[i\[j][k\.haseZ := Zmin + kd; 
node[i\[j][k\.vjeighd: := 1; 
node[i\ [j] [k] .pindex :=t; 
node[i] [j] [k] .coordX := part\t] .x; 
node[i][j][k].coordY := part[t].y; 
node[i][j][k].coordZ := part[t].z. 



part[t].z — Zmin 



J- 



Fig. 2. Building the oct-tree bottom- up (part 1) 



assume that the P processors are labeled as Pjjk where 0 < / < 2^, 0 < J < 2^, 
0 < K <2^. We have taken the liberty of treating the processors as a linear 
array in Step_3 of the algorithm. 

Since the maximum and minimum of n numbers can be computed in 6>(log n) 
time on an processor PRAM, the root computation box can be computed in 
6>(logn) parallel time on the PRAM. By Assumption 1, we can now decide the 
size of the smallest computation box as well as the size of the largest computation 
box, in constant time. By then, we should know an ^ log nTO(l) upper bound on 
the height of the oct-tree. Therefore we can dynamically allocate space for every 
possible tree node. We assume that all the fields of a tree node are initialized to 
zero at the time the memory is allocated. The total space allocated is G[n) since 
the number of nodes in a complete oct-tree of height ^ logn T 0(1) is 0(n). 
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Algorithm 2.1 (part 2) {Building the oct-tree bottom-up.} 
Step_4 {Building the tree bottom-up} 

for / := maxlevel — 1 downto L do 
All processors Pijk (0 < /, J, iF < 2^) do in parallel 



for i := 12^-^ to I2^~^ + 2^“^ - 1 do 



for j := J2 
for k := K2 



to J2 



^+ 2 ^- 



l-L 



to K2 



l-L 2 ^- 



tree[l\ node[i\[j][k\.child[Q] 
treep] ^ node[i][j] [fc].chi/d[l] 
treep] ^ node[i\[j]\k\.child\2] 
treep] ^ node[i\[j]\k\.child\^ 
treep] ^ node[i][j] [fc].chi/d[4] 
treep] ^ node[i\[j]\k\.child\^ 
tree[l\ node[i\[j][k\.child\&\ 
treep] ^ node[i\[j]\k\.child\7\ 



1 do 

^ — 1 do 
= tree[l + 1] 
= tree[l + 1] 
= tree[l + 1] 
= tree[l + 1] 
= tree[l + 1] 
= tree[l + 1] 
tree[l + 1] 
tree[l + 1] 



■ node[2i + 0] [2j + 0] [2A; + 0] 

■ node[2i + 0] [2j + 0] [2fc + 1] 

■ node[2i + 0] [2j + 1] [2fc + 0] 

■ node[2i + 0] [2j + 1] [2fc + 1] 

■ node[2i + 1] [2j + 0] [2fc + 0] 

■ node[2i + 1] [2j + 0] [2fc + 1] 

■ node[2i + 1] [2j + 1] [2A; + 0] 

■ node[2i + 1] [2j + 1] [2fc + 1] 



Also set the parent field for each of the 8 children of tree[l\ node[i] [j] [k] ; 



r\rnaxLeve 



tree[l\ node[i][j][k].baseX := Xmin + i2'^‘ 
tree[l] node[i\[j][k].size := 
tree[l] node[i][j][k].baseY := Ymin -\~ 
tree[l] node[i][j][k].baseZ := Zmin-\- 

Let tree[l] node[i][j][k]. weight be the sum of the weights of its children; 

Let tree[l] node[i][j][k].coordX ^ tree[l] node[i][j][k].coordY ^ and 

tree[l] node[i][j][k].coordZ be the coordinates of the weighted center of 
the particles contained in the computation box of the current node; 
if tree[l] node[i][j][k]. weight == 0 then 
tree[l] node[i][j][k].isLeaf := 1; 
elseif tree[l] node[i][j][k]. weight == 1 then 
tree[l] node[i][j][k].isLeaf := 1; 

Let tree[l] node[i][j][k].pindex be the index of the only particle 
contained in the current computation box; 

endif 
endfor {k} 
endfor {j} 
endfor {i} 
endfor {1} 

Fig. 3. Building the oct-tree bottom-up (part 2) 



Instead of inserting the particles to the tree from the root node, we insert the 
particles directly to the nodes corresponding to the smallest computation boxes. 
We then pass information from one layer of the tree to the layer above, starting 
from the bottom layer. Although there are ^ logn + C^(l) layers of the tree, the 
amount of time required is decreased by a factor of 8 every time we move up one 
layer. This is the key to achieving the G[n) sequential time complexity. For the 
parallel time complexity, we have the following theorem. 

Theorem 1. Algorithm 2.1 builds the oct-tree for n particles using 6>(logn) 
time and 0{n) space, using an processor CREW PRAM, provided that the 
particles satisfies Assumption 1. The constant behind the asymptotic notation is 
proportional to {^)^ . □ 
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Algorithm 2.1 (part 3) {Building the oct-tree bottom-up.} 
Step_5 {Building the tree bottom-up} 
for I := L — 1 downto 0 do 

All processors Pijk (0 < i^j^k < 2^) do in parallel 



treep] ^ node[i\ [j\ [k] .child[0] 

tree[l] node[i] [j] [k] ,child[l] 

tree[l] ^ node[i][j][k].child[2] 
tree[l] node[i] [j] [k] .child[3] 

tree[l] node[i] [j] [k] .child[4] 

tree[l] node[i] [j] [k] .child[3] 

tree[l] node[i] [j] [k] .child[6] 

tree[l] ^ node[i][j][k].child[7] 



= tree[l + 1] ^ node[2i + 0][2j + 0] [2k + 0] ; 
= tree[l + 1] ^ node[2i + 0][2j + 0] [2k + 1] ; 
= tree[l + 1] ^ node[2i + 0] [2j + 1] [2k + 0] ; 

= tree[l + 1] ^ node[2i + 0] [2j + 1] [2k + 1] ; 

= tree[l + 1] ^ node[2i + 1] [2j + 0] [2k + 0] ; 

= tree[l + 1] ^ node[2i + 1] [2j + 0] [2k + 1] ; 

= tree[l + 1] ^ node[2i + 1] [2j + 1] [2k + 0] ; 

■ node[2i P l] [2j + 1][2A;+ 1]; 



:= tree[l + 1] 

Also set the parent field for each of the 8 children of tree\l] node[i] [j][fc] 
tree[l\ node[i][j\[k].haseX := Amm + 
treep] ^ node[i\[j\[k].size := 

treep] ^ node[i\[j\[k].haseY := Ymin S; 
treep] ^ node[i][j][k],baseZ := Zmin-\- 

Let treep] ^ node[i][j][k]. weight be the sum of the weights of its children; 
Let treep] ^ node[i][j][k].coordX ^ tree[l] node[i][j][k].coordY ^ and 

treep] ^ node[i][j][k].coordZ be the coordinates of the weighted center of 
the particles contained in the computation box of the current node; 
if treep] ^ node[i][j][k]. weight == 0 then 
freep] ^ node[i][j][k].isLeaf := 1; 
elseif treep] ^ node[i][j][k]. weight == 1 then 
freep] ^ node[i][j][k].isLeaf := 1; 

Let treep] ^ node[i][j][k] .pindex be the index of the only particle 
contained in the current computation box; 

endif 
endfor {1} 

Fig. 4. Building the oct-tree bottom-up (part 3) 
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3 Computing force fields in 0(logn) time 

Given a cluster of n particles, we need to compute the potential energy func- 
tion and the force exerted on each particle by the other particles. In many 
applications, the potential energy function of a cluster is the sum of the pair- 
wise potential functions. For details on the potential functions and force fields, 
see [16]. 

3.1 The two- pass algorithm 

After the oct-tree is constructed, AppeFs algorithm can be implemented using a 
bottom- up pass and a top-down pass. We assume that there is a global variable 
FUNC which is initialized to 0 and is used to accumulate the potential energy 
function of the cluster. We also assume that the fields forceX, forceY and 
f orceZ at every tree node are all initialized to 0 before the computation. These 
fields are used to hold partial values of the force field during the computation. 

Our algorithm is presented as Algorithm 3.1. For any two nodes A and B in 
the tree, a call to procedure compGRAD(A, B) does the following: 

— Compute the force exerted on each particle in A by all particles in B and 
save the value in [A.forceX^A.forceY^A.forceZ). 

— Compute the potential between cluster A and cluster B and add this value 
to the global variable FUNC. 

During each call to procedure compGRAD(A, B), the potential function be- 
tween cluster A and cluster B is added to the global variable FUNC. The force 
exerted on each particle in cluster A by the particles in cluster B is stored in 
[A.forceX^A.forceY^A.forceZ). Notice that the force exerted on each particle 
in cluster B by the particles in cluster A is computed in the call to compGRAD(B, 
A) . Therefore, at the end of the computation, FUNC is 2 times the actual potential 
function value and [A.forceX^A.forceY^A.forceZ) is the force exerted by all 
the other particles on the particle in node A for each leaf node A whose weight 
is 1. This force is exactly the force computed by AppeFs algorithm [3,16]. 

3.2 Time complexity 

We will analyze the time complexity of Algorithm 3.1. Given any parameter ^ > 0 
which defines well-separateness and a tree node A, the number tree nodes which 
are on the same level as A and which are not well-separated from A is bounded 
by O(^), which is a constant for any given As a result, the inner most for loop 
in Step.l of Algorithm 3.1 requires constant time. Therefore, the parallel run 
time of Step.l of Algorithm 3.1 is G>{J2iLmaxievei = 0{n/P) = 0{logn). 

Similarly, we can show that the parallel run time of Step_4 of Algorithm 3.1 is 
also 6>(logn). In Step_2, only part of the P processors are active. The parallel 
runtime of Step_2 of the algorithm is 0{L) = 0{logP) = 6>(logn). Similarly, we 
can show that the parallel run time of Step_3 of Algorithm 3.1 is also 6>(logn). 
To summarize, we have proved the following theorem. 
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Algorithm 3.1 (part 1) {Computing force field in two passes.} 
Step_l {Gathering information bottom-up} 

for I := maxlevel downto L do 
All processors Puk (0 < 7, A < 2^) do in parallel 
for i := 12^-^ to 12^~^ + 2^“^ - 1 do 
for j := J2^-^ to J2^~^ + 2^“^ - 1 do 
for k := K2^~^ to K2^~^ + 2^~^ — 1 do 
if tree[l\ node[i\[j][k\.vj eight > 1 then 

Let A be tree[l] node[i][j] [A:], 
for all tree node B on level-/ of the tree such that 
(1) B. weight >1; (2) A and B are well separated; 

(3) the parents of A and B are not well separated do 
compGRAD(A, B) ; 
endfor 
endif 
endfor {k} 
endfor {j} 
endfor {i} 
endfor {1} 

Step_2 {Gathering information bottom-up} 
for / := L — 1 downto 1 do 

All processors Pijk (0 < A: < 2^) do in parallel 
if tree[l] node[i][j][k]. weight > 1 then 

Let A be tree[l] noc/e[t] [j] [fc]. 

for all tree node B on level-/ of the tree such that 
(1) B. weight > 1; (2) A and B are well separated; 

(3) the parents of A and B are not well separated do 
compGRAD(A, B); 
endfor 
endif 
endfor {1} 

Fig. 5. Computing force field in two passes (part 1) 



Theorem 2. Given a cluster of n particles satisfying Assumption 1, 
field can be computed using Algorithm 3.1 in 6>(logn) time, on an 
CREW PRAM. 



the force 
processor 
□ 



It is clear that Algorithms 2 A and 3 A yield a new linear time algorithm for 
computing force field for a cluster of n particles which are almost homogeneously 
distributed. In [5], Callahan and Kosaraju proved that a size G[n) sequence of 
well- separated decomposition can be computed in G[n) time once Si fair- split tree 
is constructed. A fair-split tree for n particles can be constructed in 6>(n log n) 
time using the algorithm of [5], without any restriction on the distribution of 
the particles. However, Algorithm 3.1 is the first 6>(logn) time algorithm using 
processors. 
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Algorithm 3.1 (part 2) {Computing force field in two passes.} 
Step_3 {Pushing information top-down} 
for / := 1 to L do 

All processors Fijk (0 < i^j^k < 2^) do in parallel 
if treep] ^ node[i][j][k].vj eight > 1 then 
Let A be tree[l\ node[i] [j] [fc]. 

for every child node B oi A such that B. weight > 1 do 
B.forceX = B.forceX + A.forceX; 

B.forceY — B.forceY + A.forceY ; 

B.forceZ = B.forceZ + A.forceZ; 

endfor 
endif 
endfor {1} 

Step_4 {Pushing information top-down} 

for / := L + 1 to maxlevel — 1 do 
All processors Puk (0 < 7, d, A < 2^) do in parallel 
for i := 12^-^ to 12^~^ + 2^“^ - 1 do 
for j := J2^-^ to J2^~^ + 2^“^ - 1 do 
for k := K2^~^ to K2^~^ + 2^~^ — 1 do 
if tree[l\ node[i\[j][k\.vj eight > 1 then 

Let A be tree[l\ node[i][j] [A:]. 

for every child node A of A such that B. weight > 1 do 
B.forceX = B.forceX + A.forceX; 

B.forceY = B.forceY Y A.forceY; 

B.forceZ = B.forceZ Y A.forceZ; 

endfor 
endif 
endfor {k} 
endfor {j} 
endfor {i} 
endfor {1} 

Fig. 6. Computing force field in two passes (part 2) 



4 Conclusions 

In this paper, we have presented a 6>(logn) time algorithm for computing force 
field in n-body simulations using an processor CREW PRAM, improving 
the previous 0(n log n) time sequential algorithm of Appel. A key to this im- 
proved complexity is an 0(n) time bottom-up construction of the oct-tree which 
was constructed top-down using 0(n log n) time in previous studies. We have 
also replaced the traditional recursive top-down force field computation with 
a non-recursive bottom- up computation method. We have also studied the de- 
pendency of the constant behind the asymptotic notation on the distribution 
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parameters ci and C 2 and on the well-separateness parameter 6. This analysis is 
important because good software for these evaluations is badly needed in prac- 
tice. Computational studies of the proposed algorithm on existing architectures 
will be reported in a forthcoming paper. Because of the space limitations, this 
extended abstract cannot contain all the details of the analysis of the algorithms. 
A full paper can be found at http://www.ennba.uvnn.edu/~xue. 
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Abstract. For problems SAT and MAX SAT, local search algorithms 
are widely acknowledged as one of the most effective approaches. Most of 
the local search algorithms are based on the 1-flip neighborhood, which 
is the set of solutions obtainable by flipping the truth assignment of one 
variable. In this paper, we consider r-flip neighborhoods for r > 2, and 
propose, for r = 2,3, new implementations that reduce the number of 
candidates in the neighborhood without sacrificing the solution quality. 

For 2-flip (resp., 3-flip) neighborhood, we show that its expected size is 
0(n T to) (resp., 0{m T which is usually much smaller than the 

original size O(n^) (resp., 0(n^)), where n is the number of variables, m 
is the number of clauses and t is the maximum number of appearances 
of one variable. Computational results tell that these estimates by the 
expectation well represent the real performance. These neighborhoods 
are then used under the framework of tabu search etc., and compared 
with other existing algorithms based on 1-flip neighborhood. The results 
exhibit good prospects of the proposed algorithms. 

1 Introduction 

Given n 0-1 variables Xj^ j G A, rn clauses Ci^ i ^ M ^ and weights Wi (> 0), 
i G M, where N = (1, 2, . . . , n} and M = {1,2,..., m}, the MAX SAT problem 
asks to determine a 0-1 assignment that maximizes the sum of the weights of 
satisfied clauses. Denoting xj = l — xj and L = Uj^^{xj^Xj} (the set of literals), 
clauses are defined by C L for i G M (e.g., Ci = {xi,T 3 , xg}). Without loss of 
generality, we assume that at most one of Xj and Xj is included in each clause. 
For a u G {0, 1}^, let 



Pi{v) = {j G A| Xj G Ci and Vj = 1, or Xj G Ci and Vj = 0} (1) 



and 




r 1, if \Pi{v)\ > 1 

{ 0, otherwise. 



Then the objective function to maximize is given by 




( 2 ) 



ieM 



Wen-Lian Hsu, Ming- Yang Kao (Eds.): COCOON’98, LNCS 1449, pp. 105-116, 1998. 
(c) Springer- Verlag Berlin Heidelberg 1998 
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The problem SAT is a decision problem asking whether there exists a i; G {0, 1}^ 
that attains f[v) = such a i; is called a satisfying assignment. The 

problems SAT and MAX SAT are known to be NP-hard. 

Many local search (abbreviated as LS) algorithms for SAT and MAX SAT 
problems have been proposed [4,5,11] • The local search starts from an initial 
solution V and repeats replacing v with a better solution in its neighborhood 
NB{v) until no better solution is found in NB{v)^ where NB{v) is the set of 
solutions obtainable from v by slight perturbations. A solution v is called locally 
optimal (with respect to the neighborhood), if no better solution exists in NB{v), 
We call the replacement of the current solution u by a better solution a move. One 
of the following two move strategies are commonly used: First admissible move 
strategy (abbreviated as FA) and best admissible move strategy (abbreviated 
as BA). FA scans the neighborhood NB{v) according to a prespecified random 
order and moves to the first improved solution. BA scans the entire neighborhood 
and move to the best solution in N B{v). The local search is often applied to a 
number of randomly generated initial solutions, and the best among the obtained 
locally optimal solutions is output. This is called the random multi- start local 
search (abbreviated as MLS). 

Let 

D{v,v') = {j e N \ Vj ^ v'j} 

and 

NBr{v) = {v' e \D{v,v')\ <r} 

(i.e., G NBr{v) is obtainable from v by flipping at most r variables). We 
call N Br{v) r-flip neighborhood. To the best of authors’ knowledge, most of 
the existing local search algorithms for SAT and MAX SAT are based on 1-flip 
neighborhood [1,3,4,5,7,9,10,11,12,13,14,15]. 

In this paper, we consider r-flip neighborhood for general r, and propose 
efficient implementations for these neighborhoods, which makes use of the mem- 
ory structure as described in Section 3. Usually the quality of locally optimal 
solutions improves if larger neighborhood is used; however, the computational 
time to search NBr increases exponentially with r, since \NBr\ = holds 

in general. To overcome this, we propose, for r = 2 and 3, a method to reduce 
the number of candidates in the neighborhood without sacrificing the solution 
quality. 

Let us call the computation needed for determining one move in LS as one- 
round. Let t denote the maximum number of appearances of a variable and £ 
denote the maximum number of literals in a clause. If we evaluate the objective 
values (2) for all the solutions in the neighborhood from scratch, taking 0{m£) 
time for each evaluation, one-round time is 0(m^|Ai^^|) = 0{m£n^') for r-flip 
neighborhood. But this computational effort can be reduced if we use a clever 
memory structure. Here we assume that each memory access takes 0(1) time, 
and necessary memory cells can be stored in the linear order, whose precise 
definition and validity will be discussed in Section 3. With this memory structure, 
in the worst case, 0(2^^) time is needed for each evaluation and O(rtU) time is 
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required to update the memory structure after a move. Hence, one-round worst- 
case time becomes 0{2^'\N Br \ + rW) = rtt). The size of memory 

space needed for this algorithm is 0(n T rnt'). 

However, the situation becomes much better if we consider the average-case 
complexity. Let r be a fixed constant. Then it will be shown that the expected 
time needed for each update of the memory structure is 0(t), and the expected 
memory requirement is 0(n + m). Furthermore, based on the restricted 2 and 
3-flip neighborhoods (without sacrificing the solution quality), it will be shown 
in Section 3 that the expected size of 2-flip neighborhood is 0{n ^ m), and 
that of 3-flip neighborhood is 0{m T t^n). Therefore, the expected one-round 
time with the restricted neighborhood becomes 0(n rn) if r = 2. In the case 
of r = 3, although additional computation is needed for the restriction of the 
neighborhood, the expected one-round time is kept as 0{rn T t^n). It is also 
noted that, for r = 1 and move strategy FA, the worst case one-round time 
of this algorithm is O(t^), and its expected one-round time is 0(t). For 1-flip 
neighborhood, similar result is already reported in [3]. 

Computational experiments for problem instances with up to n = 1000 are 
conducted to see the effectiveness of these neighborhood restrictions. It is ob- 
served that the above probabilistic estimates represent the real performance of 
LS well. Computational experiments to evaluate the effectiveness of these neigh- 
borhoods are also conducted. We tested three metaheuristic frameworks: (1) 
Random multi-start local search (MLS), (2) Iterated local search (ILS), and (3) 
Tabu search (TS). It is observed, for some types of problems, that the proposed 
2 and 3-flip neighborhood search algorithms with restricted neighborhoods are 
more effective than 1-flip neighborhood if about the same computational time 
is allowed. The details of these experiments, as well as comparison with other 
algorithms, can be found in [16]. 

2 Preliminaries 

For convenience, let 

Clj = {i G M| Xj G Ci or Xj G Q}, (3) 

t = max \CIA, 
jeN ^ 

V li — {j ^ N\ Xj G Ci or Xj G (4) 

£ = max I Fid = max ICd , 

ieM ' ' ieM ' ' 

where t denotes the maximum number of appearances of a variable and £ denotes 
the maximum number of literals in a clause. These always satisfy t < rn and 
i < n, and in most cases t m and i <C n. For a subset S C N and a 
vector V G {0, 1}^, let v^S denote the vector obtained from v by flipping the 0-1 
assignment of variables in S (i.e., D{v^ = A), and call the following increase 
in the objective function as the flip gain, 

S) = f{vlS) - f{v) = Afi{v, S), 

ieM 



(5) 
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where 



s) = vji{di{vlS) - di{v)}. 



3 LS Implementation with Memory 

In this section, we propose an efficient method to evaluate solutions in the neigh- 
borhood by making use of the memory structure. In the memory, information 
to evaluate each flip gain in 0(1) time is stored. Such information is also used 
to reduce the number of candidates in the neighborhood, as will be explained in 
3.2 and 3.3. 

3.1 Calculation of the Flip Gain 

In this subsection, we propose an algorithm that calculates all flip gains (5) in 
one-round in 0(2’^|yVi^^| + rW) time in the worst case. 

Intuitively, we first prepare flip gains of all candidates in 1-flip neighborhood. 
For the candidates in 2-flip neighborhood, instead of storing the flip gains, we 
evaluate the flip gain Z\/(u, {ji, j' 2 }) fhe sum of /\/(u, {ji}), O 2 }) 

the adjustment {ji, J 2 }), whose definition will be given by (7) later. For 
example, consider a clause Ci where ji,j 2 € and Pi{v) = 0 . In this case, 
Afi[v^ {ji}) = Afi[v^ {j 2 }) = Wi holds, and we need the adjustment term —Wi 
so that we obtain 

The value g{v^{ji:j 2 }) represents the sum of such adjustments for all clauses 
Ci. By storing such adjustments, each solution in the 2-flip neighborhood can 
be evaluated in 0(1) time. Important point is that g{v^{jiC 2 }) takes nonzero 
value only if ji,j 2 € FT holds for some clause Ci. As £ is usually small, such 
cases occur rather rarely. For the same reason, the update of such g[v^ {ji? J 2 }) 
values at each move will not be expensive. This idea can be generalized to the 
case of r > 3 by using the principle of inclusion and exclusion. 

In general, let 



gi{v,s) = 



0, 



w,,ifF,{v) CSCVF 
otherwise. 



(6) 



and 




( 7 ) 



ieM 




ier\j(zsCij s.t. Fi{v)cs 



Here, it is noted that g[v^ {j}) = Af[v^ {j}) holds for all j G N. 
Lemma 1. The flip gain can he obtained by 









(8) 
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The important point is that we keep only nonzero g{y^ S). Here we assume 
the following property, which is usually valid (e.g., by using hash technique). 
More discussions are found in [16]. 

Assumption 1. Each access to g{v,S) is possible in 0(1) time and all nonzero 
g[v^ S) necessary for neighborhood NBrfv) can be stored in 0(|{A C N\ g{v^ S) ^ 
0 and |A| < r}|) space. 

Lemma 2. [16] If the current solution v is randomly chosen from {0, 1}^ with 
probability 1/2'^, then, under Assumption 1, the expected number of nonzero 
g[v, S) necessary for neighborhood N B^[v) is 0(n T rn) for any constant r. 

Denote by g{S) the value of g[v, S) stored in the memory for the current 
solution V. The update of g{S) for the move from v to with \D[v,v^)\ < r is 
done as follows. 

Algorithm UPDATE 
for each i G do 

for each S such that Pi{v) ^ S CVLi and 1 < |A| < r do 
Set g{S) := g{S) - gi{v,S) 

end 

for each S such that Pi{v') C S CVp and 1 < |A| < r do 
Set g{S) := g{S) + gi{v', S) 

end 

end. 

The computation time needed for this is 0[rW) in the worst case. By (8), 
Af[v,S) can be calculated in 0(2'^') time for each S with \S\ < r. Therefore, 
one-round time is T rtP) = 0[n'^ 2^ E rW) in the worst case. 

Lemma 3. [16] If the current solution v is randomly chosen from {0, 1}^ with 
probability 1/2^; then, under Assumption 1, the expected time to update g{S) 
is 0{t) for any constant r, and the expected one-round time is 0{\NBr \ A t) = 
0{n^ T t). 

Let us also consider the computational time for 1-flip neighborhood. If we 
store all the 1-flip candidates with g{v^{j}) > 0 (all of which give better so- 
lutions) in a linked list, it is possible in 0(1) time to And a better solution in 
N Bi[v) or to conclude that v is locally optimal. The update of such list can be 
executed in 0(1) time for each change of g{v^{j}) value. Therefore, one-round 
time for LS with 1-flip neighborhood using FA strategy is 0{t£) in the worst 
case and 0(t) in the expected case. 

3.2 Restriction of the 2-Flip Neighborhood 

In this subsection, we derive a condition that reduces the number of candidates 
in 2-flip neighborhood without sacrificing the solution quality. The following 
lemma is immediate from (8). 
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Lemma 4. IfAf{v,{ji}) < 0 and {^ 2 }) ^ hold for the current solution 

V, then {ji, J 2 }) > 0 possible only if g{v,{ji,j2}) > 0 holds. 

Define 

I g{v,S) > 0 and IS”! < 2}. 

Then, by the above lemma, we have only to check N B 2 {v) to find a better 
solution in N B 2 {v) or to conclude local optimality To do this efficiently, we 
store all the elements in {S' C TV | g[v^S) > 0 and |S| < 2} in a linked list; 
thereby all candidates in N B 2 {v) can be scanned in 0[\N B 2 D time. The update 
of this list is possible in 0(1) time for each change of the g[v^ S) value. Therefore, 
the worst case one-round time of LS using this restricted 2-fiip neighborhood is 
0(|yVi^2 1 + it® expected one-round time is 0(|yVi^2l + 1). As for the size 

of NB' 2 ^ we have the following lemma. 

Lemma 5. [16] | Ai^ 2 (^)l = t2(n + m^) holds for any v. If we assume that the 
current solution v is randomly chosen from {0,1}^ with probability 1/2^; then 
E{\NB^ 2 {^j)\) =0(n + m). 

Based on this lemma, we have the following theorem. 

Theorem 6. [16] Under Assumption 1, the LS with the restricted 2- flip neigh- 
borhood N B 2 requires 0(n + m^ + t£^) one-round time and 0(n+m£^) memory 
space in the worst case. If we assume that the current solution v is randomly 
chosen from (0, 1}^ with probability 1/2^^ then the expected one-round time and 
the expected memory size both become 0(n + m). 

3.3 Restriction of the 3-Flip Neighborhood 

In this subsection, we derive a condition that reduces the number of candidates 
in 3-fiip neighborhood without sacrificing the solution quality. 

Lemma 7. [16] If Af{v,{ja}) < 0 for all a e {1,2,3} and Af{v,{jajb}) < 0 
for all a^b e {1,2,3} with a^b hold, then Z\/(u, {ji, ^' 2 , 23 }) > 0 is possible only 
if at least one of the following two conditions holds: ( 1 ) {21,22,23}) > 0 ; ( 2 ) 

> 0 and g{v,{jbjc}) > 0 for some a,b,ce {1,2,3} {a, b and c are 

all distinct). 

Let 

^ 2 (t^) = {{21,22,23} C N\ ^(u, {21,22}) > 0 and (u, {22,23}) > 0 } (9) 



and 



NB'flv) = {vis \ g{v,S)>0 and \S\ < 3} U [vlS\ S e B2{v)} . 

Then, by Lemma 7, we have only to check N B^flv) to find a better solution 
in NBs{v) or to conclude local optimality. The candidates in NB'flv) can be 
efficiently scanned by using a linked list as in the case of N B 2 along with an 
additional data structure, whose details are omitted here (see [16]). As for the 
size of NB'^, we have the following lemma. 



Efficient 2 and 3-Flip Neighborhood Search Algorithms for the MAX SAT 



111 



Lemma 8. [16] \N B^^{v)\ = 0{m£^ + holds for any v. If we assume 

that the solution v is randomly eh o sen from {0,1}^ with probability 1/2^^ then 
E{\N B^^{v)\) = 0{m + t^n) holds. 

Based on this lemma, we have the following theorem. 

Theorem 9. [16] Under Assumption 1, the LS with the restrieted 3- flip neigh- 
borhood requires 0{m£‘^ + E£f^n) one-round time and memory spaee in the 
worst ease. If we assume that the eurrent solution v is randomly ehosen from 
{0, 1}^ with probability 1/2'^, then the expeeted one-round time and the expeeted 
memory spaee both beeome 0(m + En). 

4 Frameworks of ILS and TS 

Based on Theorems 6 and 9, the LS with the restricted 2 and 3-flip neighbor- 
hoods can be implemented so that they become much faster and more practical 
than those with the ordinary 2 and 3-flip neighborhoods. We implemented ran- 
dom multi-start local search (MLS), iterated local search (ILS), and tabu search 
(TS), using the ordinary 1-flip neighborhood and the restricted 2 and 3-flip neigh- 
borhoods. Since the main target of our experiment is to compare the effect of 
neighborhoods, simple implementations are employed for the above three search 
algorithms, rather than pursuing algorithmic perfection (e.g., long-term memory 
is not incorporated in TS). The ILS and TS in our experiment are described as 
follows. 

Algorithm ILS 



1. Randomly generate a solution v. Set v* := v. 

2. Improve v by applying LS with NB'^.^ and let be the obtained locally 
optimal solution. 

3. If f{v^) > /(^*), set i;* := v' . 

4. If the computational time exceeds the given bound, output i;* and stop; 
otherwise, randomly choose a solution in NB^[v*) (a^ is a prespecifled 
integer), let u := and return to Step 2. 

Algorithm TS 

1. Randomly generate a solution v. Set i;* := i;, and TL := 0. 

2. Find B G NB^{v)\TL with f{v^) > f{v) or if e NB^{v) with f{v^) > /(^*) 
using FA strategy. If no such solution exists, let be the best solution in 
NBi{v)\TL. 

3. Set V := and TL := {t?}U {solutions obtainable from v by flipping at 
least one variable flipped in the last r moves} (r is a prespecifled integer). 
If f{v^) > /(^*), set i;* := vl 

4. If the computational time exceeds the given bound, output iT and stop; 
otherwise, return to Step 2. 
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In Step 2 of TS, we restrict the degrading moves to 1-flip neighborhood. Its 
reason is explained as follows. If f{v^) < f{v) for all G N Br{v)\TL^ the best 
solution in N Br{v)\TL might be included in N Br{v)\N In other words, 
the restricted neighborhood is valid only for improving moves. Therefore, we 
avoid the use of restricted neighborhood N B^ (r = 2, 3) for degrading moves in 
Step 2 of TS. 

5 Experimental Results 

The algorithms were coded in C language and run on a workstation Sun Ul- 
tra 2 Model 2300 (300 MHz, 1 GB memory). We tested four types of problem 
instances: (1) randomly generated instances with unit weights (RNDU), (2) ran- 
domly generated instances with general weights (RNDW), (3) instances derived 
from the set covering problem (SCP) and (4) an instance derived from a time 
tabling problem (TTP). RNDU instances were generated according to [ 6 ]. The 
clauses for RNDW instances were similarly generated, and the integer weights 
were randomly chosen from [1, 1000] as described in [10]. SCP instances were 
transformed from 10 set cover instances scp41 ^ scp410, which were taken from 
the web site of OR-Library. The description of TTP instance is in [1]. 

Table 1 shows the average sizes of the restricted 2 and 3-flip neighborhoods for 
RNDW instances. The data are the average of independent 10 runs of LS starting 
from randomly generated solutions. The ratios between the observed values and 
the theoretical expectation (i.e., E{\N B 2 {v)\) = 0 (n T m) and E{\N B^^{v)\) = 
0 (m T and the ratios between the observed values and the ordinary 2 

and 3-flip neighborhood sizes (i.e., |lVi^2(^)| — (2) |^^3(^)| — ( 3 )) £^re 

also shown. It is observed that the ratio between the observed values and the 
theoretical expectation is almost constant (i.e., justifying the theory). It is also 
observed that the restriction is quite effective especially for larger n. 

We then compare the performance of the restricted 2 and 3-flip neighbor- 
hoods with that of 1-flip neighborhood for MLS, ITS and TS. For comparison 
purposes, we also show the results of two local search algorithms called GSAT 
[11] and WALKSAT (abbreviated as WSAT) [14], tabu search for constraint 
satisfaction problem (abbreviated as TS-CSP) [ 8 ], and an exact algorithm for 
SAT problem called POSIT [2]. The codes of GSAT, WSAT and POSIT are 
taken from the web sites of the respective authors, and the code of TS-GSP is 
sent from the authors. All of GSAT, WSAT and TS-GSP are based on 1-flip 
neighborhood. ITS, TS and WSAT include parameters k , t and p respectively, 
which are carefully tuned. (The parameters k and r are explained in Section 4, 
and p is the parameter called noise in the program code, which is also written 
as p in [14].) Their values are also shown in the tables. 

Table 2 shows the average computational time of the tested six algorithms for 
problem SAT (i.e., to And satisfying assignments) with RNDU instances of n = 
1000 and rn = 7700, where all instances have satisfying assignments. MLS with 
r = U 2 , 3 and TS with r = 3 could not And any satisfying assignment for all of 
the 10 instances; and hence, the results are omitted. The results indicate that 1- 
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Table 1. Average sizes of the restricted neighborhoods for RNDW instances. 



a 


m i 




2-flip neighborhood 


3-flip neighborhood 


t avr. IAH 2 I 


\NB-p 


\NB'p 




n-\-m 


{ 2 ) 


avr. |iV±J 3 | 


100 


850 14 


63 


470.86 


0.496 


0.095 


5488.72 0.0138 0.0339 


100 


1150 12 


76 


617.78 


0.494 


0.125 


8714.01 0.0151 0.0539 


100 


2300 15 138 


1088.08 


0.453 


0.220 


22784.83 0.0119 0.1409 


1000 


7700 14 


60 


4637.18 


0.533 


0.009 


56335.04 0.0156 0.0003 


1000 11050 15 


84 


6493.01 


0.539 


0.013 


102766.59 0.0145 0.0006 


1000 22100 15 145 


12453.54 


0.539 


0.025 


343105.15 0.0163 0.0021 



Table 2. Average computational time in seconds of six algorithms for SAT with 
10 RNDU instances (n = 1000 and rn = 7700). The numbers in parentheses 
denote how many instances were given satisfying assignments within 300 seconds. 

Ils TS 

r=l r = 2 r = 3 r = l r = 2 GSAT WSAT TS-CSP POSIT 

= 16 = 16 = 16 r = 160 r = 180 p = 0.5 

49.5 (8) 113.6 (8) 134.4 (4) 22.0 (9) 82.5 (8) 37.7 (8) 0.13 (10) 37.5 (9) 3.7 (10) 



flip neighborhood is more effective than the proposed 2 and 3-flip neighborhoods, 
and WSAT and POSIT are much more effective than other tested algorithms. 
Tables 3^5 show the average errors in % from the trivial bound 

i.e., 

error of a solution v = (l — — ) x 100 (%), 

V 22^eM^^J 

for RNDU, RNDW and SCP instances, respectively, and Table 6 shows the 
number of unsatisfied clauses for the TTP instance. It is known that no satisfying 
assignment exists for all of these instances. For such instances, the algorithm 
POSIT just answers ‘no’ and does not seek for an assignment with maximum 
weights; and hence, the results are omitted. Here we note that algorithms GSAT 
and WSAT are originally developed for unweighted instances, and weights are 
represented by multiplexing each clause Ci Wi times. 

Table 3 shows the results for 10 RNDU instances of n = 1000 and rn = 11050. 
We can observe that 3-flip neighborhood gives the best results for MLS, and 2- 
flip gives the best results for ILS; however, 1-flip is the best choice for TS. It is 
also observed that the performance of TS with r = 1,2 is better than GSAT, 
WSAT and TS-CSP. Similar results are observed in Table 4 for RNDW instances 
of n = 1000 and m = 11050. 

Table 5 shows the results for 10 SCP instances of n = 1000 and rn = 1200. 
We can observe that 3-flip neighborhood gives the best results for all of MLS, ILS 
and TS, among which ILS with 3-flip neighborhood gives the best performance. 
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Table 3. Results for 10 RNDU instances of n = 1000 and rn = 11050. 



algorithm 


r 


error (%) 


T^f^moves CPU secs. 


MLS 


1 


1.2606 


2079353.8 


300.0 


MLS 


2 


0.8154 


297311.8 


300.2 


MLS 


3 


0.7231 


15367.3 


303.7 


ILS (k = 8) 


1 


0.5113 


1736761.0 


300.0 


ILS (k = 16) 


2 


0.5005 


117468.7 


300.0 


ILS (k = 16) 


3 


0.5213 


3285.6 


301.0 


TS (r = 80) 


1 


0.4027 


2022823.5 


300.0 


TS [t = 80) 


2 


0.4443 


33335.2 


300.0 


TS \t = 80) 


3 


0.6271 


992.2 


300.7 


GSAT 




0.5484 


5000000.0 


290.9 


WSAT {p = 0) 




0.4525 25000000.0 


306.5 


TS-CSP 




0.4606 


36374.7 


300.0 



Table 4. Results for 10 RNDW instances of n = 1000 and rn = 11050. 



algorithm 


r 


error (%) 


T^f^moves CPU secs. 


MLS 


1 


0.6461 2349333.3 


300.0 


MLS 


2 


0.3525 


320681.5 


300.3 


MLS 


3 


0.3056 


14760.5 


305.2 


ILS {k = 16) 


1 


0.2825 1956733.9 


300.0 


ILS (k = 64) 


2 


0.2244 


208575.2 


300.1 


ILS (k = 16) 


3 


0.2437 


4724.1 


302.4 


TS (r = 120) 


1 


0.1923 


568919.4 


300.0 


TS [t = 120) 


2 


0.2205 


26602.0 


300.1 


TS \t = 140) 


3 


0.2823 


1229.1 


301.4 


GSAT 




0.7348 


15000.0 


385.8 


WSAT {p = 20) 




0.2733 


100000.0 


390.6 


TS-CSP 




0.2812 




540.0 



Table 6 shows the results for the TTP instance of n = 900 and rn = 236539. 
This instance is a real world time tabling problem [1]. The best result reported 
in [1] is also indicated as CIKM for comparison purpose. We can observe that 3- 
flip neighborhood is best for MLS, and 2-flip is best for ITS and TS. The overall 
best performance is given by ITS with 2-flip neighborhood, which is competitive 
with TS-CSP. Here we note that the size of CSP formulation of this instance has 
900 0-1 variables and 774 constraints, which is much smaller than that of MAX 
SAT. 

In summary, we have the following observation. 



1. For randomly generated instances, 1-flip neighborhood under the framework 
of TS and WSAT gives the best performance. 
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Table 5. Results for 10 SCP instances of n = 1000 and rn = 1200. 



algorithm 


r 


error (%) 


moves 


CPU secs. 


MLS 


1 


6.8618 14830875.3 


300.0 


MLS 


2 


1.2164 


2698703.0 


300.0 


MLS 


3 


0.7315 


51890.7 


301.0 


ILS (k = 8) 


1 


0.7606 


4369174.8 


300.0 


ILS (k = 16) 


2 


0.7261 


186745.3 


300.0 


ILS (k = 16) 


3 


0.7223 


4437.3 


300.8 


TS (r = 160) 


1 


1.0586 


384690.2 


300.0 


TS [t = 100) 


2 


0.8396 


82043.7 


300.0 


TS (r = 50) 


3 


0.7418 


2076.2 


300.7 


GSAT 




1.5009 


600000.0 


461.4 


WSAT {p = 0) 




0.7797 


6000000.0 


368.7 


TS-CSP 




0.7327 


210691.8 


300.0 


optimal 




0.7214 







Table 6. Results for the TTP instance of n = 900 and m = 236539. 



algorithm 


r 


T^f^unsat moves CPU 


secs. 


MLS 


1 


124 


416164 


1500 


MLS 


2 


105 


207052 


1500 


MLS 


3 


99 


33357 


1500 


ILS (k = 4) 


1 


86 


490890 


1500 


ILS (k = 1) 


2 


85 


126068 


1500 


ILS (k = 1) 


3 


91 


5711 


1500 


TS (r = 20) 


1 


97 


788431 


1500 


TS (r = 40) 


2 


88 


147049 


1500 


TS (r = 10) 


3 


94 


4599 


1500 


CIKM 




93 5000000 


— 


GSAT 




125 3000000 


2497 


WSAT {p = 0) 




94 3000000 


1517 


TS-CSP" 




85 


101555 


1800 



^Size of CSP formulation: 900 0-1 variables, 774 constraints. 



2. For problems with structures, such as the set covering problem and the 
time tabling problem, the restricted 2 and 3-flip neighborhoods exhibit good 
prospects under various metaheuristic frameworks. 



6 Conclusion 

In this paper, we proposed efficient implementations of 2 and 3-flip neighbor- 
hoods for the MAX SAT. It is shown that the expected one-round time and 
the memory space for 2-flip (resp., 3-flip) neighborhood is 0(n T ^n) (resp., 
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0(m-h t^n)). It is also shown that the expected one-round time for 1-flip neigh- 
borhood with first admissible (FA) move strategy is 0{t). The computational 
results with up to n = 1000 random instances indicate that these estimates by 
the expectation represent the real performance of LS well. The computational 
experiments for four types of problem instances show that the proposed 2 and 
3-flip neighborhoods are effective for structured problems, while 1-flip neighbor- 
hood gives better performance for random instances. 
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Abstract. We consider the problem of uniform generation of random 
integers in the range [1, n] given only a binary source of randomness. 
Standard models of randomized algorithms (e.g. probabilistic Turing 
machines) assume the availability of a random binary source that can 
generate independent random bits in unit time with uniform probability. 
This makes the task trivial if n is a power of 2. However, exact uniform 
generation algorithms with bounded run time do not exist if n is not a 
power of 2. 

We analyze several almost-uniform generation algorithms and discuss 
the tradeoff between the distance of the generated distribution from the 
uniform distribution, and the number of operations required per random 
number generated. In particular, we present a new algorithm which is 
based on a circulant, symmetric, rapidly mixing Markov chain. For a 
given positive integer A, the algorithm produces an integer i in the range 
[l,n] with probability pi = pi{N) using O(Alogn) bit operations such 
that \ pi — l/n\<c f3^ ^ for some constant c, where 



This rate of convergence is superior to the estimates obtainable by com- 
monly used methods of bounding the mixing rate of Markov chains such 
as conductance, direct canonical paths, and couplings. 

Keywords: Random number generation, uniform distribution, Markov 
chain, rapid mixing, eigenvalue, circulant matrix. 

1 Introduction 

We consider the generation of almost-uniform random integers in the range [1, n], 
taking into account the required time, space, and number of random bits. The 

Part of the work was done at the International Computer Science Institute (ICSI), 
Berkeley, California 
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basic assumption is that independent random bits can be generated in unit 
time. If n is an exact power of 2, say n = 2^, then the generation of a uniformly 
distributed random integer in the range [l,n] is easily accomplished in time 
0(m) = O(logn) by generating rn consecutive random bits. However, if n is not 
a power of 2, no algorithm with bounded running time can generate numbers in 
[l,n] from the exact uniform distribution (see below). 

The task of generating uniformly distributed random elements of a set whose 
size is not an exact power of two arises frequently in the study of randomized 
algorithms and is usually treated as a primitive operation. This is in part justified 
by the fact that simple and efficient almost-uniform generation algorithms are 
known. However, it appears that the exact costs and trade-offs between accuracy 
and required resources of these algorithms have not been analyzed in detail. 
One of our aims is to explore which options exist and to compare their costs. 
We present two new algorithms - one based on a rapidly mixing Markov chain 
and one based on a reduction from approximate counting to almost uniform 
generation - and compare their resource requirements with those of the well- 
known modular algorithms. 

Sinclair [18] considers the problem on an abstract level, and shows polyno- 
mial time equivalence between almost uniform generation on probabilistic Turing 
machines and on a different machine model which allows biased coin flips. 

One important class of applications which requires uniform generators for 
sets of arbitrary size is the simulation of heat hath Markov chains (cf. [5] for 
a precise definition). In practice, the size of the sets can be extremely large 
[15]. Heat bath Markov chains are one of the standard tools in computational 
physics, and are used frequently in high-precision numerical simulations. It is 
easy to show that a bias in the distribution of the generator translates directly 
into a similar bias in the output distribution of the Markov chain. 

We present a new algorithm which is based on the simulation of a rapidly- 
mixing circulant Markov chain. Its analysis gives a direct bound on the second- 
largest eigenvalue of the transition matrix of the Markov chain and is of interest 
in its own right. In particular, we observe that the commonly used methods of 
bounding the mixing rate of Markov chains (conductance [18], direct canonical 
paths [17], couplings [2]), yield weaker bounds than the one obtained here. Di- 
rect bounds on the second-largest eigenvalue of transition matrices have been ob- 
tained previously, mostly based on algebraic properties of the underlying domain 
(e.g. [3]). However, the structure of our Markov chain, as well as the technique 
used to bound its mixing rate seem different from previous results. 

The probabilistic Turing machine (PTM) is the most commonly used machine 
model in the study of randomized algorithms [14,18]. It is a standard Turing 
machine equipped with the ability to generate (or access) random bits in unit 
time. A PTM is deterministic, except for special coin-tossing states in which 
there are exactly two possible transitions, determined by the flip of an unbiased 
coin. 

Proposition 1. Given n G IN whieh is not a power of 2, let A^, he a randomized 
algorithm (PTM) whieh outputs numbers in [l,n] and whose running time is 
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hounded by tn e Let < tn he an upper bound on the number of random 
bits used by An. Let pi be the probability that An outputs i G [l,n]. There exists 

1 £ [ 1 7 such that 

\pi — l/n| > 

Proof. Omitted. See [9]. 

Intuitively, An has to place 2'^'^ balls (elementary events) into n bins. If n is not 
a power of 2, some bins have to receive at least one ball more than others. The 
situation is slightly different for Las Vegas type algorithms whose run time is 
not bounded. In the simplest case, the algorithm can use random bits, assign 
an equal number of elementary events to each number in [l,n], and decide to 
use more random bits or, simply, not terminate with the remaining probability. 
We will concentrate on algorithms whose running time is bounded, and refer to 
Las Vegas type algorithms only where appropriate. 

Since producing the exact uniform distribution on [l,n] is not possible, we 
try to generate integers in [l,n] with an almost-uniform distribution. We use 
the well-known relative pointwise distance r.p.d. (See e.g. [18]) to measure the 
closeness of the output distribution and the uniform distribution: The r.p.d. 
between two probability distributions p, g on a finite set X {qi >0 for alH G X) 
is defined as 

Af \ \Pi-(li\ 

Q) = niax !■ 

qi 

In the following, q will always be the uniform distribution, and we write A[p) 
instead of Z\(p, q) to denote the r.p.d. of p from the uniform distribution. Thus 
A{p) = max^^x \npi - 1|. 

The rest of this paper is organized as follows: In Sect. 2, we describe the 
Markov chain algorithm. Our main result on the bound of the mixing rate of 
the Markov chain is stated in this section. Section 3 analyzes the resource re- 
quirements of three alternative algorithms. Remarks and conclusions are given 
in Sect. 4. Proofs of the results that are omitted due to space constraints can be 
found in the full paper [9]. 

2 A Rapidly Mixing Circulant Markov Chain 

In this section, we describe an algorithm based on the simulation of a rapidly 
mixing Markov chain M . In 0(V logn) time, this algorithm produces a random 
integer i in the range [l,n] with distribution p such that 



A{p) < n j3^ , where p = — 2a/2 — \J ^ — a/5 J 0.4087 . (1) 

The bound fd ^ 0.4087 deserves attention in two respects. Firstly, known algo- 
rithms reduce the r.p.d. only by a factor of 0.5 in each step. Similarly, standard 
methods for bounding the mixing rate of a Markov chain yield bounds which are 
worse than 0.5. These issues will be addressed below. 
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2.1 Construction of M 

We define an n x n transition matrix F = ( pij ) such that the corresponding 
Markov chain M on state space {1,2, has the following properties: 1) 

M is ergodic with stationary distribution tt = 2) M is rapidly 

mixing, i.e. the TV -step transition matrix F^ converges quickly to the limiting 
probabilities; 3) M can be simulated efficiently. That is, the time to simulate one 
transition step is O(logn). The preprocessing time and space requirements for 
M are also O(logn) . Given such F^ the algorithm (referred to as Algorithm I) 
simulates N steps of M . The first condition guarantees that M converges to the 
uniform distribution, and the second condition ensures that a small number N 
of simulation steps is sufficient. The third condition ensures that each simulation 
step can be executed efficiently. 

An n X n circulant matrix C = C(ai, U 2 , . . . , a^) is a matrix of the form 

a\ a‘2 ’ ’ * 

' ' * (^n—1 
Oj2 U 3 * * * OjI 

where each row is a single right circular shift of the row above it [7]. 

Assume that n is not a power of 2, and let rn = [lognj. Then ^ < 2^ < n, 
and n can be written in the form n = 2^ T p with 0 < p < 2^. Consider 
symmetric, circulant nxn 0-1 matrices C = C(0, U 2 , as, . . . , a^) where exactly 2^ 
of the entries a 2 , as, . . . , are equal to 1 . Since we are forcing C to be symmetric, 

this imposes the condition ak = a^+ 2 -A: for k = 2,3, ... ,n. For example, for 
n = 7, we have m = 2 and p = 3. In this case there are three such matrices: 
C(0, 1, 1, 0, 0, 1, 1), C(0, 1, 0, 1, 1, 0, 1), and C(0, 0, 1, 1, 1, 1, 0). Each such matrix 
C defines an irreducible, aperiodic (i.e. ergodic) Markov chain M on n states 
{1, 2, . . . , n} whose transition matrix is /^ = -^C . The symmetry of C guarantees 
that the stationary distribution of the corresponding Markov chain M is the 
uniform distribution on {1,2,... ,n}. Note that the eigenvalues of F and C are 
related by a constant factor 2^. Let Ai denote the second largest eigenvalue of 
C. It is well-known that the mixing rate of M can be bounded by Ai = 2“^Ai. 
The following inequality for the r.p.d. follows from [18,6,13]: 

A{p{N)) < nAf , (2) 

where p{N) is the distribution on the states of M after N simulation steps. We 
consider the problem of picking the nonzero a/^’s so that Ai is minimized: 

Theorem 1. Suppose n = 2'^ ^ p with 0 < p < 2^. There exists a symmetrie, 
eireulant nxn 0-1 matrix = C(0, U 2 , as, . . . , a^) with 2^ nonzero entries in 
its first row sueh that 
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Furthermore, the first row ofC^ eontains at most two symmetrieally plaeed bloeks 
of Fs starting at eolumn s = |~^] + 1. 

The theorem is our main result. Its proof is several pages long and had to be 
omitted due to space restrictions. See [9]. 

We take M = to be the Markov chain on {1, 2, . . . , n} whose transition 
matrix is P = P® = -^ C®. The structure of C® is such that ^ if and 

only if 

j G I /^ = 0, T • • • , 2^“^ — l}u{nT2— (sT/^) I /;; = 0, T • • • , 2^“^ — 1} . (3) 

Since P is circulant, pij = ^ if and only if j is in a translate modulo n of the 
set of indices in (3). Thus to move from a state i of M to state j , we only need to 
generate a random binary number r in the range [0, 2^—1] . We then use the high 
order bit to select the translate of one of the two sets of consecutive indices in 
(3) . After this, the new state j is simply the (r+ l)-st smallest index in the subset 
chosen. More formally, we describe the steps of this algorithm as Algorithm I. 
Let RANDOM[0,2^ — 1] denote a procedure which returns a random integer r 
in the range [0, 2^ — 1] or, equivalently, rn consecutive random bits provided by 
our machine model (PTM). 

Algorithm I : 

Input: n, N 
Output: i G [l,n] 
begin 

m:=[lognJ; + 

cur. state := 1; 

for j := 1 to A do 
begin 

i := RANDOM[0,2^- 1] ; 

if i e [0,2™-i) then cur. state := 1 + [ [cur. state — 1 T s T i) mod n] 
else cur. state := 1 + [[cur. state — l + (nT2 — (s + i — 2^“^)) mod n] 

end 

i := cur. state] 

return(i); 

end 

The number of operations required to take one step on the Markov chain M is 
0[ni) = 0(log n) . Thus, the total running time of Algorithm I is 0[N log n) . By 
(2) and Theorem 1, after starting from an arbitrary initial state and simulating 
N steps of M, the probability of being in some particular state j does not differ 
from 1 /n by more than a constant multiple (w.r.t. N) of j3^ , where (3 is as given 
in (1). 



2.2 Other methods of bounding the mixing rate 

We note that the bound of Ai < 0.4087 is obtained by a detailed analysis, taking 
special properties of circulant matrices into account. The well-known general 
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methods for estimating mixing rates, while being useful general purpose tools, 
appear to be too coarse-grained to yield a similar bound. We outline this in the 
following paragraphs. Details can be found in [9]. 

The conductance ^ [18] which measures the expansion of the transition graph 
is often used to bound the second largest eigenvalue of the transition matrix via 
the inequality Ai < 1 — #^/2. Since, by definition, # < 1, this method cannot 
yield a better bound than 0.5 > 0.4087. A closer analysis for the particular case 
considered here shows that the conductance is significantly smaller than 1 , and 
consequently the bound obtained in this fashion is actually much larger than 
0.5. 

The method of [17,8] which bounds the second largest eigenvalue directly 
by a direct canonical paths argument (as opposed to going via the conductance) 
usually leads to tighter bounds than conductance-based methods. We can show 
by means of a counting argument that this approach does not yield a better 
bound than Ai 0.7. 

The coupling method tries to bound the mixing rate by a direct probabilistic 
argument and without bounding the eigenvalues. The use of the coupling method 
is based on [2], which bounds the mixing rate by where T is called the 

coupling time. Basic but tedious steps show that T > 2, resulting in a mixing 
rate of at least 0.912 ^ 0.4087. 

3 Alternative Algorithms 

In this section we analyze three alternate algorithms for the generation problem. 
Algorithms II and IV are straightforward modular algorithms. Algorithm III is 
a new algorithm and based on the reduction from almost-uniform generation to 
approximate counting [12]. 

Algorithm II: This algorithm is described in [10]: Generate a random sequence 
of m = [logn] bits. If the sequence is the binary representation of an integer i± 
in the range [0,n — 1], then return i := ii + 1. If not, generate another m-bit 
random number ^2 using the same process. If after N such trials, none of the 
integers turns out to be in [l,n], then return i := ijy — 2^“^. A 

more formal description of this algorithm is given as Algorithm II. 

Algorithm II : 

Input: n, N 
Output: i e [1, n] 

rn := [log n] ; 

for j := 1 to A do 

i := RANDOM[0, 2^ - 1] +1 ; 
if i G [l,n] then return (i) and exit; 
return(i — 2^“^) ; 

Proposition 2. Let pn{N) denote the output distribution of Alg. II. Then 
A{pn{N)) <2-^ . 
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Proof. Omitted. 



Algorithm II can be run in Las Vegas mode by dropping the upper limit of 
N loop iterations. In this case, the expected running time E{n) is 




< 






< 



(i-h^ 



< 



8 log n , 



where r = 2^ — n. Thus the expected running time of Algorithm II is no worse 
than 8 logn, independently of the parameter N. Using Chernoff bounds, it is easy 
to show that the running time is unlikely to exceed its expectation significantly. 



Algorithm III: There is a close relation between almost -uniform generation prob- 
lems and the corresponding approximate counting problems (computing the 
number of elements in the set) [12]. In our case, the solution to the counting 
problem is simply n, and the solutions of relevant subproblems are easily de- 
rived. This makes it possible to design a generation algorithm based on the 
well-known reduction from almost- uniform generation to approximate counting 
of [12]. 

Given a bitstring s, let solns(s) = \{x E [O^n — 1] : 3v : sv = x}\ be the 
number of elements of [0, n — 1] whose binary representation begins with s. These 
solutions of counting subproblems are easily computed. The algorithm generates 
a random element of [0, n — 1] one bit at a time, starting with the most significant 
bit. At the start of the k-th round, the k — 1 most significant bits have been 
determined. The invariant is that the probability of producing any given prefix 
is proportional to the number of elements of [0, n— 1] whose most significant bits 
coincide with this prefix. It is easy to show by induction that this relation will 
hold, if the next bit is set to 1 with probability so Ins ( prefix o l)/solns (prefix), 
where o denotes concatenation. If, at any given point, the prefix is such that 
solns(prefix) = 2^ (for some i > 0), the process can be stopped. Algorithm III 
summarizes these steps. 



Algorithm III : 

Input: n > 1 
Output: e [l,n] 

prefix := €] k := [logn]; (* bitlength of (n — 1) *) 

repeat 

if so Ins (prefix o 1) = 0 then prefix := prefix o 0 

else 

with probability d ivide[solns( prefix o 1), so Ins (prefix)], 
set prefix := prefix o 1; 
otherwise set prefix := prefix o 0; 
k := k — 1] 

until k = 0 or solns (prefix) is a power of 2; 
prefix := prefix o Q^-iogsoins(prefix) . 

if solns(prefix) > 1 then prefix := prefixoRANDOM[0, solns (prefix) — 1]; 
ret urn ( prefix+1 ); 
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As stated, Algorithm III is an exact uniform generator. However, its running 
time is unbounded because the binary representation of pi := 
not be finite. One obtains an approximate version of the algorithm by truncating 
this fraction to some finite number rn of bits. Steps similar to those of [18] show 
that A{piii) < 2~^ if m > 2|~logn] + A. Thus, achieving an accuracy of 2“^ 
requires a total of at most 2|~logn]^ T A|~logn] random bits. Note that there 
are only logn relevant values of pi. Those values can be obtained and stored in 
a precomputation step. Thus, the algorithm needs 0(log^ n T A logn) time and 
space in the worst case. 

The algorithm can be run in Las Vegas mode if the probabilistic decision 
(‘with probability . . . ’) is implemented appropriately. The details of this part 
of the algorithm can be found in [9]. For both the standard and the Las Vegas 
version, we have 

Proposition 3. The expected running time of Alg. Ill is O(logn). 

Again, one can use Chernoff bounds to show that the actual running time is 
concentrated around its expectation. 

Algorithm IV: This algorithm appears to be widely used in practice: fix m ^ 
logn, generate a random integer M in the range [0,2^ — 1] by generating rn 
consecutive random bits, and output M mod n. 

Algorithm IV : 

Input: n, rn 
Output: i G [1, n] 
begin 

M := RANDOM[0,2^ - 1] ; 
return((M mod n) + 1) ; 

end 

Proposition 4. Let p/y(m) denote the output distribution of Alg. IV. Then 
A{piv{m)) < n2“^. 

We calculate the number of bit operations required by Algorithm IV. Let 
b — [lognj T 1. Thus M and n are m-bit and 6-bit integers with rn>b. We can 
consider algorithms of varying complexity for the calculation of the remainder 
R= {M mod n) depending on the sizes of the numbers involved. Straightforward 
division of M by n followed by a multiplication and subtraction to calculate 
R requires 0[mb) bit operations. Using the asymptotically faster Schonhage- 
Strassen algorithm for large integers, the integral quotient of a 26-bit number 
by a 6- bit number, as well as the product of two 6-bit numbers can be obtained 
in 0(6 log 6 log log 6) bit operations [16,1]. To use this algorithm for remainder 
calculation we prepend the binary representation of M with zeros if necessary 
and assume that 6—1 divides m. Write 

m/(6— 1) — 1 

M = Yi 

i=0 
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where each Mi is 6 — 1 bits. The remainders Vi = mod n for i = 

l,2,...,m/(6— 1) — 1 can be computed by m/(6— 1) multiplications and divisions 
requiring 0(6 log 6 log log 6) bit operations each, if the Schonhage-Strassen algo- 
rithm is used. After this phase, each quantity MiXi mod n can be computed with 
an additional 0(6 log 6 log log 6) bit operations. Finally the resulting m/[h— 1) 
remainders found are summed up modulo n in another 0(^^6) = 0(m) bit 
operations. The total number of operations required for the computation of R 
becomes 0(m log 6 log log 6) = 0(m log log n log log log n) . 

4 Comparisons and Conclusions 

The maximum relative error between the generated distribution and the uniform 
distribution goes to zero geometrically for each of the four algorithms considered 
here: with rate 0.5 for Algorithms II, III, IV (using standard division), and with 
rate approximately 0.4087 for Algorithm I. 

The following table compares the algorithms from a different perspective. It 
lists the resources required by each to produce one random number with error 
bound 2~^ (r.p.d. from uniformity), based on the bounds derived in the previous 
sections. 



Algorithm 


worst case 


average case 




time 


random bits 


time 


random bits 


1 

II 

III 

IV (simple) 
IV[16] 


0( (fc + log n) log n) 
0(fclog n) 

log n) log n) 

0( (fc + log n) log n) 
0((fc + logn) • . . . 

• log log n log log log n) 


0.775(6 + logn) logn 
6 logn 

(6 + 2 log n) log n 
6 + log n 
6 + log n 


cf. worst case 
0(log n) 
0(log n) 
cf. worst case 
cf. worst case 


cf. worst case 
O(logn) 
O(logn) 
cf. worst case 
cf. worst case 



It is seen that the algorithms have similar worst case running times, with Algo- 
rithm II being the fastest. The faster convergence rate of the Markov chain is 
hidden in the big-0 notation. However, it reduces the number of random bits re- 
quired by a factor of 0.775. Algorithm IV requires the smallest number of random 
bits and comes to within two bits of the lower bound of Prop. 1. Algorithms II 
and III can stop prematurely. Their average-case running times do not depend 
on k and are well below their worst-case times. This was reflected in a series 
of experiments we performed in which Algorithms II and III were significantly 
faster than Algorithms I and IV. 

In the context of the construction of the Markov chain used in Algorithm I, 
one can address the problem of picking the best possible nxn 0-1 circulant matrix 
(in terms of the magnitude of the modulus of the second largest eigenvalue) for an 
arbitrary distribution of Fs in the first row. Theorem 1 only gives an upper bound 
for the modulus of the second largest eigenvalue and only in case the matrix 
is symmetric and the row sums are 2 L^°sM Xhe advantage of the particular 
distribution of the Fs considered in Theorem 1 as blocks in the first row of the 
matrix is small storage and ease of transition selection for the associated Markov 
chain. Such a distribution, however, is not necessarily optimal for the solution 
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of the more general problem in which constraints of space and constructibility 
are not critical issues. 



References 

1. A. Aho, J. Hopcroft, and J. Ullman. Design and Analysis of Computer Algorithms. 
Addison- Wesley Publishing Company, 1974. 124 

2. D. Aldous. Random walks on finite groups and rapidly mixing Markov chains. In 

Seminaire de Probabilites XVIf Lecture Notes in Mathematics 986, pages 243-297. 
Springer- Verlag, 1982. 118, 122 

3. N. Alon and Y. Roichman. Random Cayley graphs and expanders, 1996. 
Manuscript. 118 

4. N. Biggs. Algebraic Graph Theory^ page 16. Cambridge University Press, 1974. 

5. R. Bubley, M. Dyer, and C. Greenhill. Beating the 2 A bound for approximately 
counting colourings: A computer-assisted proof of rapid mixing. In Proceedings of 
the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms^ San Francisco, 
California, 1998. 118 

6. E. Cmlar. Introduction to Stochastic Processes. Prentice-Hall Inc., 1975. 120 

7. P.J. Davis. Circulant matrices. Wiley, 1979. 120 

8. P. Diaconis and D. Strook. Geometric bounds for eigenvalues of Markov chains. 
Annals of Applied Probability, 1:36-61, 1991. 122 

9. O. Egecioglu and M. Peinado. Algorithms for Almost-uniform Generation with an 
Unbiased Binary Source. Technical Report TRCS98-04, University of California 
at Santa Barbara, 1998, (http://www.cs.ucsb.edu/TRs/). 119, 121, 122, 124 

10. L. Goldschlager, E.W. Mayr, and J. Ullman. Theory of parallel computation. 
Unpublished Notes, 1989. 122 

11. I.S. Gradshteyn and I.M. Ryzhik. Table of Integrals, Series, and Products, page 30. 
Academic Press Inc., 1980. 

12. M. Jerrum, L. Valiant, and V. Vazirani. Random generation of combinatorial 
structures from a uniform distribution. Theoretical Computer Science, 43:169-188, 
1986. 122, 123 

13. J.G. Kemeny and J.L. Snell. Finite Markov Chains. Springer- Verlag, 1976. 120 

14. R. Motwani. Lecture notes on approximation algorithms. Technical report, Stan- 
ford University, 1994. 118 

15. M. Peinado. Random generation of embedded graphs and an extension to Do- 
brushin uniqueness. In Proceedings of the 30th Annual ACM Symposium on Theory 
of Computing (STOC’98), Dallas, Texas, 1998. 118 

16. A. Schdnhage and V. Strassen. “Schnelle Multiplikation groBer Zahlen” , Comput- 
ing, No. 7 (1971), 281-292. 124, 125 

17. A. Sinclair. Improved bounds for mixing rates of Markov chains and multicom- 
modity flow. Combinatorics, Probability & Computing, 1:351-370, 1992. 118, 

122 

18. A. Sinclair. Algorithms For Random Generation And Counting. Progress In The- 
oretical Computer Science. Birkhauser, Boston, 1993. 118, 119, 120, 122, 124 



Improved Algorithms for Chemical Threshold 
Testing Problems 



Annalisa De Bonis, Luisa Gargano, and Ugo Vaccaro 

Dipartimento di Informatica e Applicazioni Universita di Salerno 84081 Baronissi 

(SA), Italy 

{debonis , lg,uv}@dia.unisa. it 



Abstract. We consider a generalization of the classical group testing 
problem. Let us be given a sample contaminated with a chemical sub- 
stance. We want to estimate the unknown concentration c of this sub- 
stance in the sample. There is a threshold indicator which can detect 
whether the concentration is at least a known threshold. We consider 
either the case when the threshold indicator does not affect the tested 
units and the more difficult case when the threshold indicator destroys 
the tested units. For both cases, we present a family of efficient algo- 
rithms each of which achieves a good approximation of c using a small 
number of tests and of auxiliary resources. Each member of the family 
provides a different tradeoff between the number of tests and the use of 
other resources involved by the algorithm. Previously known algorithms 
for this problem use more tests than most of our algorithms do. For 
the case when the indicator destroys the tested units, we also describe 
a family of efficient algorithms which estimates c using only a constant 
number of tubes. 



1 Introduction 

The well known group testing problem originated in the area of chemical analysis 
as a blood test technique employed to detect the infected members of a popula- 
tion [5,10]. Since then, it has become clear that the group testing model occurs 
in a variety of situations, ranging from molecular biology applications [1,3,8], to 
multiaccess communication [11], software development [9] and many others. We 
refer to the monograph of Du and Hwang [6] for an excellent treatise on group 
testing. In the classical group testing problem, there is a set of elements each 
of which may be either good or defective. The problem consists of identifying 
all the defective elements using a minimum number of group tests. Damaschke 
[4] has considered a generalization of the group testing problem in which the 
problem consists of estimating the concentration of a chemical substance in a 
given sample contaminated with that substance. The search model described 
by Damaschke uses, as test device, a threshold indicator which gives a positive 
response if and only if the concentration of the tested sample is at least a fixed 
threshold. Tests are performed on units obtained by first diluting the original 
sample with water and then by either diluting or mixing previously generated 
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units. The model described in [4] allows each unit to be tested more than once. It 
is the purpose of this paper to proceed further along the line of research initiated 
in [4], both by improving some of the results given therein and by considering a 
variation of the concentration problem which models the more realistic situation 
when units of liquid which have already been tested can not be tested again. 

The model. We assume that a single unit of the sample, whose concentration 
we want to estimate, has been given. Tests are performed by means of a threshold 
indicator which gives a positive response if and only if the concentration of the 
tested sample is at least a fixed threshold. This threshold is adopted as measure 
of the concentration. For that reason, a positive response of our indicator means 
that the concentration in the tested sample is at least 1. Tests are performed on 
units of liquid obtained by means of merge operations. A merge operation may 
either involve units of liquid with different concentration or add units of water, 
that is liquid with concentration equal to 0, to units of contaminated liquid. The 
very first merge operation consists of diluting the original unit of sample with 
water. We assume that we dispose of an arbitrarily large reservoir of water. Let 
c denote the concentration of the original sample. A unit of liquid generated 
during the search process has concentration equal to rc, for some r < 1. The 
positive number r is said the concentration ratio of this unit of liquid. We denote 
k units of liquid with concentration equal to rc with the symbol kxr. Therefore, 
a sequence ni x ri , . . . , x r^ denotes a situation where we dispose of Ui units 
with concentration ratio r^, for i = 1, . . . , m. 

More precisely, the search model is defined by the following assumptions: 

- Each test is performed on a single unit with unknown concentration rc. A 
positive answer indicates that rc > 1 

- An integer number of units can be extracted from each available sample 

- It is possible to merge an arbitrary number n of units. Merging n units with 
concentrations ci, C 2 , • • • , c^ yields n units each with concentration 

Our goal is to estimate the unknown concentration of the original sample up to 
some given accuracy. 

Our results and outline of the paper. The main tool consists in having 
recognized our problem as a case of unbounded search problem [2] with particular 
constraints. We first consider the more interesting case when tests performed 
by the threshold indicator destroy the tested units and, as a consequence, units 
which have already been tested must be discarded. This case is treated in Sect. 2 
where it is found a family of strategies which provide tradeoffs between the 
number of tests, the number of merge operations and the quantity of water 
involved in the search process. Our result implies an algorithm which finds an 
interval of length at most one including c, c > 32, by using [log cj +2 [log logcj +3 
tests, at most 5[logcJ T 4 units of water and at most |[logcJ^ T ^[logcj + ^ 
merge steps. Our algorithm compares favourably with the best algorithm given 
in [4], in that, even though it works under a more difficult test model, it performs 
a smaller number of tests while still using a logarithmic number of units of water. 
Both algorithms involve a logarithmic number of tubes in the search process. In 
an effort to reduce the number of tubes involved in the search process, in Sect. 2 
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we also consider the case when the algorithm may dispose only of a constant 
number of tubes. 

The case when the threshold indicator does not affect the tested units is con- 
sidered in Sect. 3. The best result given by [4] under this model, is an algorithm 
which allows to find an interval of length at most one including c > 16, using 
about 21ogc tests, 21ogc units of water and log^ c merge steps. In Sect. 3, we 
describe a family of algorithms for approximating c with an error of at most one 
under this model. In particular, we present an algorithm which approximates 
c > 8 with this accuracy by performing [logcj T 2 [log log cj + 3 tests and by 
using at most 3 [log cJ T 3 units of water and at most | [log cJ ^ T ^ [log cJ T ^ 
merge operations. 

Finally, we remark that our best algorithms require a number of tests not far 
from the known lower bound on the number of tests for unconstrained unbounded 
search algorithms. 

Due to space limits, most of the proofs are omitted from this extended ab- 
stract. 

2 Testing in the Destructive Model 

In this section we consider the problem of approximating the unknown concen- 
tration of a given sample by means of tests which affect the tested units. 

In the proof of the following lemma (and of Lemma 3 of Sect. 3) we mostly 
use the proof technique introduced in [4], properly modified to handle the more 
complicated case of destructive tests not considered in [4]. 

Lemma 1. Consider one unit of sample with unknown eoneentration c > 2, 

whieh is known to lie inside the interval [2^“^, 2^). For any t < 2^“^, there exists 

2 

an algorithm whieh determines an interval of length at most 2 i+p~c containing 

c. The algorithm uses t tests, at most ^ YXnlp ^2 T Tuerge operations 

and at most pF^^t units of water. 

Proof. Our algorithm performs a binary search for ^ inside the interval ( ^ 

Each test decreases the length of the current interval containing ^ by one half 
of its value before the test. After i tests, - has been confined inside an in- 
terval of length whose left end and right end are denoted with and hi 
respectively. The (i + l)-th test is performed on a unit with concentration ra- 
tio [oi Fbi)/2 obtained by merging one unit with concentration ratio and 
one unit with concentration ratio bi. Let uq and bo denote the values ^ and 
2 ^^ respectively. In order to get started, the algorithm needs to generate a unit 
with concentration ratio uq and a unit with concentration ratio 6 q. Starting with 
ro = 0 and ri = 1, the algorithm generates a sequence of concentration ratios 
' ' FpFp-\-i such that = 2 ^' Two units with concentration ratio are 
obtained by merging a unit with concentration ratio tq (water) and a unit with 
concentration ratio r^_i, for 2 < i < p F 1. It is uq = and bo = rp. Once 
1 X r 2 , • • • , 1 X rp, 2 X have been generated, the algorithm starts the binary 
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search for - inside (rp+i,rp]. The algorithm performs the i-th test on a unit 
with concentration ratio = (a^_i -h6^_i)/2. Two units with concentration 

ratio are obtained by merging one unit with concentration ratio a^_i and 

one unit with concentration ratio . If the response of the test is positive then 
the algorithm sets = a^_i and hi = otherwise it sets and 

bi = bi-i. In the former case we say that is the mate of while in the 

latter case we say that bi is the mate of From this definition, one has 

that {rp+^+i, mate of = {a^, 6^}. For each A:, the ratio Vk coincides with 

one of the extremes of an interval of length containing Extending the 
definition of mate also to ri, • • • , the other extreme of the above said interval 
is called the mate of Observe that for any i < p the mate of is tq. After t 
tests, the unknown value ^ has been confined inside the interval (a^, bt] of length 
2 ^' Therefore, c has been confined inside the interval which has length 



^ at 



one has 






atb- 



Since it is (b^ 

atOt ^ 

£0 < ^ 



bt>\ 



then 



2^+p- 



Our goal is to prove that, for any t < 2^ it is possible to generate a unit 
with concentration ratio using at most p-\-2t units of water and at most 

^ T operations. Following [4], the algorithm can be 

described as a pebble game on an infinite acyclic directed graph. The nodes of 
the graph represent the concentration ratios r^s and for each i > 1 two direct 
edges enter r^+i, one starting from r^, the other from the mate of r^. We can 
see nodes rg, ri, • • • , r^, • • • , as disposed from left to right in the graph. For each 
^ Pi denotes the number of pebbles on r^. A node which contains one or 
more pebbles is said occupied. Initially only rg and r\ are occupied: rg contains 
an infinite number of pebbles while r\ contains one pebble. If both Vi and its 
mate are occupied then it is possible to move one pebble from each of these two 
nodes to . Let ^ be a node index specified by the game. The goal of the game 
is to deliver a pebble to Vg using a small number of steps and a small number of 
units of water. In our test problem a pebble on node Vi represents a unit with 
concentration ratio r^. The pebbles on rg represent the water reservoir while the 
single pebble on ri represents a single unit of the sample whose concentration 
we are trying to estimate. For i > 0, let Vi and Vj be the predecessors of r^+i, 
that is, the two nodes which the two edges entering start from. The concen- 
tration ratio is the arithmetic average of and moreover, moving two 
pebbles to corresponds to merging one unit with concentration ratio and 
one with concentration ratio rj. If a node is occupied then it is possible to test 
a unit with concentration ratio r^. The basic step of our game strategy consists 
of occupying the node with largest index which is empty and whose predecessors 
are both occupied. The game terminates when a pebble is delivered to the goal 
node rg. A unit with concentration ratio is tested only when is occupied for 
the first time. As a consequence of this test which destroys the tested unit, one 
of the two pebbles on is discarded. Notice that each node except rg contains at 
most two pebbles. The statement of the lemma is a consequence of the following 
claims, whose proofs are omitted. 



Improved Algorithms for Chemical Threshold Testing Problems 131 



Claim 1 Let t>0. If s = p 1 -\- 1 is the index of the rightmost occupied node 
then Tp+i, • • • , Tg all together contain at most t + 2 pebbles. 

Claim 2. If s is the index of the rightmost occupied node then nodes ri , • • • , 
all together contain at most s pebbles. If s > p-\-l and was destination of the 
last move then ri • • • , all together contain at most s — 1 pebbles. 

Claim 3. If ri, • • • , Tp are all empty then , • • • , together contain 2^ ^ — tT 1 
pebbles. 

Claim 4. In order to deliver a pebble to , a total of ^ T Xlm=p +2 T merge 
operations are sufficient. 

Let s = p-\- 1-\- 1, It IS evident that if one of p±^ • • • ^ Pp is not zero then a further 
move is possible. From Claim 3 it follows that if ri, • • • are all empty then 
Tp+i, • • • jT's contain 1 + 1 pebbles. Claim 1 implies that tT2 > 

from which it follows that it is possible to deliver a pebble to Vg = Vp^t+i for 
any t < 2^“^. From Claim 4 one has that this can be done by using at most 
(p+t+i ) — ^ 'Y^rn ^+2 T ^orge operations. Since Claim 2 implies that at the end 
of the game ri, • • • ,rg contain at most s — 1 pebbles and since t pebbles have 
been discarded as a consequence of the t tests, then the algorithm uses at most 
s — 1 1 = p 2t units of water. □ 

Corollary 1. Consider one unit of sample with unknown eoneentration c whieh 
is known to lie inside the interval [2^“^, 2^), withp > 5. There exists an algorithm 
to find an interval eontaining c of length at most The algorithm uses pT 2 
tests, at most T ^p T % merge steps and at most 3p T 4 units of water. 

In the following we describe a family of algorithms which determine [log c\ . First, 
we introduce some notation. Let g{i,j) denote the function of two non-negative 
integers defined recursively as follows: g{i,j) = i if j = 0 and g{i,j) = 2^^^’-^“^^ if 
j > 1. For all non- negative integers j such that c > ^(0, j — 1), log^-^^ c is defined 
as follows: log^-^^ c = c if j = 0 and log^-^^ c = loglog^-^”^^ c if j > 0. 

Let j be a fixed positive integer. The following algorithm first determines, 
in stage(O), the smallest integer / (0 < / < j ) such that g{i — 1, j — /) < c < 
g{hi ~ 0 f^^ some i>2. Such value of i is equal to [log^-^”^^ cj T L In stage(s), 
s = 1, • • • , j — / — 1, the algorithm searches for kj-i-g = [log^-^”^”®^ cJ T L 
Algorithm (i > 1): 
stage(O): 

Let us consider a diluting sequence which iteratively apply the following step 
until a unit with the desired concentration is obtained. 

step b Add one unit of water to a given unit of liquid while obtaining two 
units with half of the concentration of the given unit. Store one of these two 
units and use the remaining one for the successive application of step b. 
Starting with the given unit of sample and iteratively performing step b, it is 
possible to generate the sequence of A = {1 x Consider the subsequence 

{lx of S. The i-th test of this stage involves one unit with concen- 
tration ratio if the result of this test is positive then the next terms of S 
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up to 1 X are generated, otherwise no more terms are generated. Let kj 

denote the index i of the last term 1 x generated, 

set / = 0 

while kj-i = 1 repeat the following steps: 

perform a test on a unit with concentration ratio j-i-i) 

else set kj-i- 



i-i 



Increment I by one. 
for r = / + 1 , • • • , j - 1 
stage(r — /): 

set Lq = and Rq = 



for i 



set M. 



kj-r-\-l 1 

r L^-i-\-Ri- 



perform a test on a unit with concentration ratio 



’i-i 



set kj- 



if the test is positive then set L[ = M[ and R\ = R[ 
else set L[ = Ll_i and R^ = M- 

Dr 

V 



It is easy to prove by induction that k^ = [log^^^ cj + 1 if c > ^(0, m) and 1 
otherwise. Therefore, the last stage of the algorithm yields k± = [log cj + 1 . Notice 
that the units tested during stage(O),* • • ,stage(j — / — 1) are among those terms 
of S generated during stage(O). This stage requires log[g{kj ^j)) = g{kj^j — 1) 
units of water and merge operations. 

If c > 32, the algorithm Aj^ for j > 2, does not need to test the unit with 
concentration ratio As far it concerns the algorithm Ai, the test on 1 x ^ 
performed in stage(O) by this algorithm may be skipped. Therefore, if c > 32 
then the unit with concentration ratio ^ is preserved after the execution of Aj 
and the algorithm of Corollary 1 can be used to confine | inside an interval of 
length ^ Since it is 2^^“^ < | then the algorithm of Corollary 1 requires 

ki~\-l tests and at most — — + merge steps and 3ki T 1 units 

of water. The above strategy finds an interval of length at most | including the 
unknown concentration c > 32 using kj -\- 1 -\- Xlr=[+i(%-’^+i ~ 1) + + 1 tests, 

at most g{kjA — 1) + 3/^i T 1 units of water and at most g{kjA — 1) + f (^i — 
1)^ + ^(^1 “ 1) fi steps. Hence, the following theorem holds: 

Theorem 1. For each j > 1 there exists an algorithm Aj which finds an in- 
terval of length at most 1 including an unknown concentration c > 32. Let I 
be the smallest integer i such that [log^-^”^^ cj > 1. The algorithm Aj performs 
[log^-^“^^ cj + [log^-^”^”^^ cj + • • • + [log^^^ c\ + [log^-^^ cj + / + 3 tests and uses at 
most — l) + 3[log cJ +4 units of water and at most — 1) + 1 [log cj^ + 

^ [log cJ + II merge steps. 



Corollary 2. There is an algorithm which finds an interval of length at most 1 
including the unknown concentration c > 32 with [logcj + 2[loglogcJ + 3 tests, 
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using at most 5 [log cj T 4 units of water and at most | [log cj ^ T ^ [log cj T 
merge steps. 

The above corollary provides a very good tradeoff between number of tests and 
number of units of water used to approximate c. Although our algorithm works 
under a more complicated test model than that used in [4] (i.e. the algorithm 
may test the same unit of liquid more then once) this corollary represents an 
improvement with respect to Damaschke’s algorithm, in that, it uses a smaller 
number of tests while still using a logarithmic number of units of water. 

In case our major concern is to lower the required number of tests, we must 
carefully choose the most appropriate algorithm Aj. If, for example, we select an 
algorithm Aj with a very large j when c is small, then the algorithm performs 
during stage(O) a large number of tests which give very little contribution to the 
search. We can use the same approach proposed by Bentley and Yao [2] in the 
context of unbounded search, to first decide which value of j is more appropriate 
and then apply the selected algorithm Aj. Following their idea, we determine 
£*(c) = min/i such that kh = [log^^^ cJ + 1 is equal to 1. 

In order to find ^*(c) we test units with concentration ratios h > 

until < 1 and then set ^*(c) equal to h. The selected algorithm is Aj with 

j = (c) — 1 . The number of tests required to find j is j T 1 • Since it is known that 

kj = 2, tests in stage (0) of algorithm Aj can be left out. The tests and merge 
steps performed to find ^*(c) constitute stage(O) of this new algorithm. The 
successive j — 1 stages coincides with stage(l),* • • ,stage(j — 1) of the algorithm 
Aj . The given strategy finds ki = [log cJ + 1 using j T 1 T Xlr=i {^j-r+i ~ 1) tests. 
Notice that in order to find the appropriate j, we generate a sequence of units 
whose concentration ratios are the first g{lA) terms of the sequence S defined in 
stage(O) of Aj. The sequence of units so generated contains all the units tested 
in stage(l),* • • ,stage(j — 1) of Aj. Hence, we have the following theorem: 

Theorem 2. There exists an algorithm A* which finds an interval of length at 
most 1 including an unknown concentration c > 32 with ^*(c) + [log^^ T 

[log^"^ cJt* * *T [log^^^(c)jT2 tests, using at most g{l,fA{c) — l)-\-^[logc\-\-^ 
units of water and at most ^(1, ^*(c) — 1) + | [log T ^ [log cJ T merge steps. 

In the rest of this section we describe a strategy for approximating c using only 
a constant number of tubes. 

Lemma 2. Given two units with concentration ratio rj2'^ and two units with 

concentration ratio r/^2^ such that rj < ^ <rk, there exists an algorithm to find 

2 

an interval of length at most i'^cluding c. The algorithm uses 

u T 4 tests, ^ T T 94 units of water, ^ T ^ — 14 merge operations and 4 
tubes. 

Proof. We describe a strategy which for each 1 < i < u T 1, preserves the 
following invariant: 

(a) at step i it holds a^_i < - < and one disposes of 4 units of liquid 
described by 2 x 2^“^+^a^_i, 2 x 2^“^+^N_i. 
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Invariant (a) is true at step 1 with ag = rj and 6 q = Suppose invariant 
(a) be true for i then we have 2 x x and a^_i < 

- < bi-i. Adding 2 units of water to each of the two concentrations we get 
4 X 2^“^a^_i,4 X Then we merge 2 units with concentration ratio 

2^“^a^_i and 2 units with concentration ratio getting 2 x 2 x 

, 4 X 2'^-^ a^-i+K-i ^ consider a diluting sequence which iteratively 

apply the following step until a unit with the desired concentration is obtained, 
step d: Add one unit of water to a given unit of liquid obtaining two units 
with half of the concentration of the given unit. Discard one of these two 
units and store the remaining one for the successive application of step d. 
Starting with 1 x 2'^~^ iteratively performing step d u — i times we 

obtain 1 x Such unit is tested and if the result of the test is positive 

then we set = a^_i and bi = otherwise we set 

bi = bi-\. We store 2 x a^2^“%2 x 6^2^“^ and the invariant is restored. 

After n — 5 steps, we get 2 x 2^a^i-5,2 x 2^bu~5 with au -5 < - < Adding 
2^ — 2 units of water to each of the two concentrations, we get 2® x 2® x 
Let riu -5 = 2®. For i = n — 4,...,uT4, we execute the following step: 

Merge x a^_i with x 6^_i. Test 1 x 

according to the result of this test. Set Ui = min{n^_i — — 1} 

and if i < w T 4, store Ui x a^, x bi. 

The above strategy performs w T 4 tests. After these w T 4 tests ^ has been 
confined inside an interval of size and by using an argument similar to 

the one used in the proof of Theorem 1 , it is possible to see that this is equivalent 

2 

to confining c inside an interval of size at most • D 



If we start with 2 x Vj2^ 



2 X Vk2^ with r 



2P 



and Vk 



2P~ 



then the 



above strategy finds an interval of length at most 22^+1,^ < 1 - 

The following algorithm determines p — 1 = [logcj for any c > 8. 

Algorithm A : 

stage(O): This stage is very similar to stage(O) of A*, except for the fact 
that we do not store all the units generated during this stage. For j > 2, let 
Bj{i) = i — 1 T upper bound to the overall number 

of tests performed by stage(l),* • • ,stage(j — 1). Let 1 x ^,1 x , 2 x 
with p(l,/i — 1) < 2^ < p(l,h), {h > 1) denote the sequence of concentration 
ratios so far generated. Of this sequence we have discarded all the units but 
1 X ,1 X ,2 X If 2™ = g{l,h) for some 

h> 2 then one unit with concentration ratio 2^ = p(l,/i) is tested and conse- 
quently discarded. If the response of the test is positive then 1 x ^^(2) | 

is discarded. At the end we are left with 1 x 



^g(l,£*(c)-2)- 



_J 1 X 



2 s(lX*(<=)-l)-|logB(^,(„„( 2 )| ,2 X dilute 1 X 

with i^(r*(c)-i)(2) — 1 units of water obtaining i^(r*(c)-i) (2) units of concentration 

29 (lXn-)- 2 )- |log ( 2 ) l+log ( 2 ) • 

stage(r) (for r = 1, • • • , — 2) : this stage is similar to the corresponding 

stage of A*. The only difference is that the tested units are not generated during 
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stage(O). When a unit is needed, it is generated from a unit with concentration 
ratio ^,(1, (2) i +iog (2) by successive iterations of step d. 

Since = 2, then i^r*(c)-i(2) is an upper bound on the overall num- 

ber of tests performed by stage(l),* • • ,stage(^*(c) — 2), and, as a consequence, 
we can generate as many test units as needed by these stages. Algorithm A 
uses at most g(l,£*(c) - 1) + i^p*(c)_i)(2) - 1 + (g(l,£*(c) - 1) - g(l,£*(c) - 
~ 1) units of Water and at most g[l,£*[c) — f) + f+ 
{g{l,£*{c) - 1) - g{l,£*{c) - 2))ECA5%*(c)-r) - 1) merge steps. 

Notice that stage(O) of algorithm A does not test units with concentration 
ratio ^ and ^ which are therefore still available. Consider a search strategy 
which first applies algorithm A to the initial sample to find p = [log cj T 1 and 
then executes the algorithm of Lemma 2 starting with the units 2 x and 2 x ^ 
which can be obtained from units 1 x 1 x Then the following theorem holds: 



Theorem 3. There exists an algorithm whieh finds an interval of length at most 
1 ineluding an unknown eoneentration c > 8 using 6 tubes and 

- ^*(c)+ [log^^ T [log^^ (c)-2) [log^^^(c)J T2 number of tests, 

- at mosf im£^i_5l + 94 + ^(l,r(c) - + - f + 

{g{l,£*{c) - f) - g{l,£*{c) - 2)) cJ + 2 units of water, 

- at most blog y 2) 5([log^cJ-2) _ 

g{l,£*{c) - cJ + 2 merge steps. 

3 The Conservative Model 

In this section we consider the case when tests do not destroy the tested units. 
This is the same model considered by Damaschke [4]. As in Sect. 2, we describe 
a family of algorithms Aj which provide tradeoffs between the number of tests, 
the number of units of water and the number of merge steps. The first phase of 
the algorithm Aj performs the same steps executed by the algorithm Aj of Sect. 
2. The algorithm of Lemma 3 is then used to perform the binary search for ^ 
inside the interval 2 Li°s same lines as Lemma 1 we can 

prove the following result: 

Lemma 3. Consider one unit of sample with unknown eoneentration c > 2, 
whieh after an applieation of A j is known to lie inside the interval [2^“^,2^). 
For any t < — 3 there exists an algorithm whieh determines an interval 

2 

of length at most 2 ^+p-c containing c. The algorithm uses t tests and at most 
(p+^+i) ^ operations and at most t T 1 units of water. 

Corollary 3. Consider one unit of sample with unknown eoneentration c whieh 
after an applieation of Aj is known to lie inside the interval [2^“^,2^), with 
p > 4. There exists an algorithm to find an interval eontaining c of length at 
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most L The algorithm uses p + 1 tests, ^ — | merge steps and at most 

p + 2 units of water. 

Theorem 4. For each j > 1 there exists an algorithm Aj which finds an interval 
of length at most 1 including an unknown concentration c > S. Let I be the 
smallest integer i such that cj > 1. The algorithm performs cj + 

[log^^^ cj + [log^-^^ cj + / + 3 tests and uses at most g{kj,j — 
1) + [log cJ + 3 units of water and at most g{kj , j — 1 ) + | [log cJ ^ ^ [log cJ + ^ 

merge steps. 

Proof. Algorithm Aj is obtained by applying algorithm Aj followed by an ap- 
plication of the algorithm of Corollary 3. □ 

In the special case j = 1, the algorithm Ai attains the same performances of 
Damaschke’s algorithm [4]. Moreover, setting j = 2 in the statement of Theorem 
4 yields the following improvement: 

Corollary 4. There is an algorithm which finds an interval of length at most 
1 including an unknown concentration c > 8 with [logcj + 2[loglogcJ + 3 tests, 
using at most 3 [log cJ + 3 units of water and at most | [log cJ ^ ^ [log cJ + ^ 

merge steps. 

As in Sect. 2 we can optimize the choice of j to get the following theorem: 

Theorem 5. There exists an algorithm A* which finds an interval of length at 
most 1 including an unknown concentration c > 8 with ^*(c) + [log^^ + 

[log^"^ cJ T* • • + [log^^^(c)J +2 tests, using at most ^(1, £*(c) — 1)+ [log cJ +3 

units of water and at most ^(1, ^*(c) — 1) + | [log cj^ + ^ [log cJ + ^ merge steps. 
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Abstract. Domain decomposition is one of the most effective and pop- 
ular parallel computing techniques for solving large scale numerical sys- 
tems. In the special case when the amount of computation in a subdo- 
main is proportional to the volume of the subdomain, domain decompo- 
sition amounts to minimizing the surface area of each subdomain while 
dividing the volume evenly. Motivated by this fact, we study the follow- 
ing min-max boundary multi-way partitioning problem: Given a graph 
G and an integer A: > 1, we would like to divide G into k subgraphs 
Ci,...,Cfc (by removing edges) such that (i) \Gi\ = 0{\G\/k) for all 
i G {1, . . . , fc}; and (ii) the maximum boundary size of any subgraph 
(the set of edges connecting it with other subgraphs) is minimized. 

We provide an algorithm that given C, a well-shaped mesh in d dimen- 
sions, finds a partition of G into k subgraphs Ci, . . . , Cfc, such that for all 
C Gi has 0{\G\/k) vertices and the number of edges connecting Gi with 
the other subgraphs is 0{{\G\/kY~^^^). Our algorithm can find such a 
partition in 0(|C| logfc) time. Finally, we extend our results to vertex- 
weighted and vertex-based graph decomposition. Our results can be used 
to simultaneously balance the computational and memory requirement 
on a distributed-memory parallel computer without sacrificing the com- 
munication overhead. 



1 Introduction 

Domain decomposition is one of the most effective and popular technique for 
solving large scale numerical systems on parallel computers [6,8]. This technique 
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is used for finding solutions to partial differential equations by iteratively solving 
subproblems defined on smaller subdomains. Thus, it is a divide- and-conquer 
technique. When applying this technique, it is desirable to decompose the domain 
into subdomains with approximately the same computational work associated 
to them (for balancing the load) and to minimize communication among subdo- 
mains (for reducing total communication and communicational bottleneck) [6]. 

We first focus in the special case where the amount of computational work 
associated to a subdomain is proportional to the volume of the subdomain. 
Here, domain decomposition amounts to minimizing the surface area of each 
subdomain while dividing the volume evenly. 

The ratio of the measure of the boundary to the measure of the computational 
work of a subdomain is sometimes referred to as the surface-to-volume ratio 
or the communication-to- computation ratio of the subdomain. Minimizing this 
ratio plays a key role in efficient parallel iterative methods [8]. 

To solve partial differential equations numerically, one discretizes the domain 
into a mesh of well-shaped elements such as simplices or hexahedral elements. 
The density of mesh points, and hence the size of mesh elements, may vary 
within the domain giving rise to unstructured meshes [4,13,17]. Obtaining good 
partitions of unstructured meshes is, in general, significantly more challenging 
than partitioning their uniform/regular counterparts. 

The main result established in this work is that every d-dimensional well- 
shaped unstructured mesh has a k-wscy partition where the surface-to-volume 
ratio of every sub-mesh is as small as that of a regular d-dimensional grid that 
has the same number of nodes. 

In Section 2, we introduce the problem of min-max-boundary multi-way 
partitioning. In Section 3, we describe a multi-way partitioning algorithm and 
present our main result. In Section 4, we extend the results of Section 3 to graphs 
with non-negative weights at each vertex. More precisely, we propose an efficient 
algorithm that partitions vertex-weighted graphs into subgraphs of similar total 
weight and vertex size and at the same time achieves low surface-to-volume 
ratio in all subgraphs. Such multi-way partitioning algorithm can be used to 
simultaneously balance the computational work and the memory requirements 
on a distributed-memory parallel computer without sacrificing communication 
overhead. In Section 5, we address the vertex-based partitioning problem in 
order to handle graphs with large vertex degree. 

2 Multi-way Partitioning 

A bisection of a graph (T is a division of its vertices into two disjoint subsets 
whose sizes differ by at most one. In general, for every integer /^ > 1, a k-way 
partition of (T is a division of its vertex set into k disjoint subsets of size 
or where |(T| denotes the number of vertices in G, 

Partitions that evenly divide the vertices are not necessary in most applica- 
tions [15]. In most cases, balanced partitions suffice. Given a graph G = (P, A') 
and an integer k > 1 and a real number /? > 1, a partition P = {Gi, . . . , G/^} 
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is a (/?, /^) -partition of G if |D^| < , for all i G {1, . . . where Vi is 

the vertex set of Gi. We denote by dy(Gi) the set of boundary-vertices of Gi^ 
i.e. the set of vertices in Vi that are connected by an edge of to a vertex not 
in Vi] we denote by dt:{Gi) the boundary-edges of Gi^ i.e. the set of edges in G 
exactly one of whose endpoints is in Vi . 

We consider the following two costs associated with a (/?, /^)-partition: 



The problem of min-total-boundary (multi-way) partitioning is to construct a 
(/?, /^) -part it ion that minimizes total-boundary, while min-max-boundary 
(multi-way) partitioning is to construct a (/?, /^)-partition that minimizes 
max-boundary. 

3 Bounds for Min— Max— Boundary Partitioning 

We first introduce some terminology. Let ^ be a family of graphs that is closed 
under the subgraph operation, i.e. every subgraph of a graph G £ Q belongs to 
Q. For 0 < q; < 1, we say Q has an -separator theorem or Q is -separable if 
there is a constant c such that every n-node graph in Q has a bisection of cut-size 
at most cn". Moreover, we refer to the latter type of bisections as n"-separators. 
(More information concerning small separators can be found in [11,16].) 

We denote by Q{oi) a family of graphs that is n"-separable and closed un- 
der the subgraph operation. Examples include bounded-degree planar graph 
(0(n^/^ )-separable) [11], graphs with bounded genus (n^/^-separable) [10], 
graphs with no h-clique minor for a constant h (n^/^ -separable) [1], well-shaped 
meshes in IR^ and nearest neighbor graphs in IR^ ((n^“^/^)-separable) [13,14,16]. 

The min-total-boundary partitioning problem has been addressed in the 
literature. The following lemma has been shown in [15]. 

Lemma 1. Let k be an integer such that k > 1. Then, for every graph G in 
G{oi) a k-way partition P such that total-boundary£;(P) = 0{k^~^\G\^) can be 
constructed. 

A closed related problem is bifurcators [5]. A graph G has an (Fq, Fi, . . . , L))- 
decomposition tree if G can be decomposed into two subgraphs Gq and G± by 
removing no more than Fq edges from G, and in turn, both Gq and Gi can 
be decomposed into smaller subgraphs by removing no more than L\ edges 
from each, and so on. An n-node graph has an a-bifurcator of size F if it has 
an [F,F / [5,F / 0^ , ....l)-decomposition tree. Bhatt and Leighton [5] showed that 
every graph in G{oV) has a V2-bifurcator of size 0[^/n) if o; < 1/2, and has a 
^/2-bifurcator of size 0(n") if o; > 1/2. 

The following is the main result of this section which establishes a separator 
theorem for min-max-boundary partitioning. 



total-boundary^- (P) = :iw)i /2 




i=l 



max-boundaryy (P) = \^E{Gi)\ . 
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Theorem 2 (main). Let k be an integer such that k > 1. Then, every bounded- 
degree graph G in G{oi) has a {2, k) -partition F such that max-boundary^- (/^) = 

om/m- 

Notice that Theorem 2 implies Lemma 1 . Thus, our main result can be seen as 
an extension of the result of [15] cited above. 

3.1 Simultaneous Partition of Vertices and Boundary 

We first examine a simple example. Consider a ^/n x ^/n grid in two dimensions 
where we assume both k and ^/n are powers of two. One way of partitioning 
the grid is to divide it into two ^/n x y ^/2 grids by removing the edge in the 
middle of every row (a y^-separator) , and then divide each of the two sub- 
grids into two ^/n/2 x ^/n/2 sub-grids by removing the middle edge of every 
column. This process can continue by recursively dividing the sub-grids until k 
disconnected sub-grids are found. Clearly, each final sub-grid has n/k vertices 
and at most Ay^n/k boundary-edges. However, the naive recursive application 
of the separator Theorem of Lipton and Tarjan does not, in general, guarantee 
the generation of a k-wscy partition F with max-boundary£;(P) = 0{y^n/k) 
for all bounded degree n-node planar graphs. The following somewhat stronger 
version of the small-separator Theorem was used in partitioning the 2D grid: 
at every stage of the divide-and-conquer, (1) Each subgraph was divided into 
two subgraphs of the same size by removing a set of edges whose size is on the 
order of the square-root of the size of the subgraph (a la standard Lipton-Tarjan 
Theorem). (2) The boundary-vertices of the subgraphs were divided evenly. 

Our method is motivated by the latter observation, more formally given 
below. 

Lemma 3. Let k > 1 be a power of two. Let G be a bounded-degree graph in 
G{oi) such that \G\ is a power of two. If in every stage of a divide-and-conquer 
partitioning procedure the vertices and boundary-vertices of each subgraph are 
evenly divided by a separator, whose size is on the order of the a-th power of 
the size of the subgraph, then the divide-and-conquer procedure, on input G, will 
generate a k-way partition whose max-boundary is 0{{\G\/k)^). 

Proof: Let s[i) be the maximum possible number of boundary- vert ices for 
graphs at level i of the divide-and-conquer partitioning procedure. It follows 
from the assumption of the lemma that there exists a constant c such that 
'^( 1 ) ^ if ^ > I 7 

s(i) < s(i — l )/2 T c • (|G|/ 2 ^)“ 

<c-{\G\/2y - 

Since o; < 1, we get that s{i) = 0((|(T|/2^)"). Fixing i = log A:, we have s{i) = 
0[(\G\!k)^). The lemma follows from the assumption that (C is a bounded- 
degree graph. □ 
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Unfortunately, we may not always be able to find a small separator that 
evenly divides both vertices and boundary- vert ices. We show that this simulta- 
neous partition can be achieved approximately. 

A variation of the following lemma were given in Lipton and Tarjan [11]. 

Lemma 4. Let G = {V, E) he a graph in G{oi) such that IGj is a power of two. 
Let S be a subset ofV. Then^ one can find an 0[\G\'^)-separator that divides G 
into two subgraphs G\ = (Ui,Ui) and G 2 = (^ 2 ,^ 2 ) such that IS'nUil = [|5'|/2J 
and IS'n U 2 I = nS'l/2]. 

Lemma 5. Let 0 < e < 1/2. Let G = {V,E) be a graph in G{oi) such that IGj 
is a power of two. Let S' C U be a subset of V . Then^ one can find an 0(|U'|")- 
separator that divides G into two subgraphs Gi = (Ui,L\) and G 2 = (U 2 ,it 2 ) 
such that |SnUi| = \_\S\I2\, |SnU 2 | = nS|/2], and |Ui|,|U 2 | < (l + e)|(S|/2. 

Proof: Let t be the smallest integer such that 1/2^ < e. Divide G into T = 2^ 
subgraphs Gfi... ^Gf^ of equal vertex size by recursively using n"-separators. 
By Lemma 1 this can be done so that the total number of edges removed is 
0{T^~^\G\^) = 0(|(S|"). Now, divide each G[ = (U/,L/) into two subgraphs 
G[ I and 2 by Lemma 4, so that S OV/ is evenly partitioned. Without loss of 
generality assume \G[fi\ < 2 l* Consider the following procedure for dividing 

G into two subgraphs G\ and G 2 satisfying the conditions stated in the lemma: 

1. Let G\^G 2 be empty graphs. 

2. For i = 1 to d\ 

If |<Ci| A |C' 2 |, then let G\ = Gi\JG[ and G 2 = G 2 ^G^ 2 ] otherwise 
let Gi = Gi\J G '-2 and (C 2 = <^2 U (N/i. 

First S is evenly divided between G\ and (^ 2 - Moreover, there are at most 
OdC'l") edges of G connecting G\ and G 2 - We now show that |Ui|,|U 2 | U 
(1 + l/2^)|C'|/2. We will prove that at the end of every iteration of the for- 
loop in the above procedure, HC^il — 1^^211 U The proof is by induction on 

the for-loop counter i. Let Ui and Vi be the size of Gi and G 2 after the i-th iter- 
ation, respectively. The claim is true when i ^ 1 because 2 1 — \G\rr 

and hence |wi — i;i| = | |C^i i| — |C^i 2 I I ^ Ibe induction hypothesis, we 

get \ui_i — Vi_i \ < |C'|/2C WLOG, assume Thus Ui = Ui_i T \G[ 

and Vi = Vi-\ T \G[ 2 !- If then since \G[fi\ < \G[ 2 1? we get that 

Ui — Vi = {ui_i T — {vi_i T IG/ 2 I) 

~ (|C^yil ~ |C^y2l) {ui-i — Vi-i) G Ui-± — Vi-i < |G|/2^ . 

If Ui <Vi^ then since \Gi^fi\ T IG/ 2 I = |C'|/2d we get that 

Vi — Ui = {vi-i T IG/ 2 I) — {ui-i + 

= (|C'y2l “ IC'yil) + (g- 1 — ^^-i) < IC'^21 “ IC'yil ^ 



□ 
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3.2 An Algorithm for and the Proof of the Main Theorem 

Let G = {V^E) be a graph. Let O = and e be a constant satisfying 

the conditions of Lemma 5. Assume 1(^1 is a power of two and that we know a 
bisection of G of cut size OdGI"). Consider the following recursive procedure: 

Algorithm: min-max-boundary-partition((C, 6>, e) 

1. If |(C| < 6> then return G, 

2. Apply the procedure of Lemma 5 to divide G into G\ = (Vi, E\) and 
G 2 = (L 2 , E 2 ) where S is chosen to be the set of all boundary- vert ices 
in (C (at the first level of the recursion let S be the boundary- vert ices 
of the known bisection of G), 

3. Let the set of bound ary- vert ices of Gi and G 2 be those boundary- 
vertices inherited from G and those produced by the partition of the 
previous step. 

4. Recursively call min-max-boundary-partition(C'i, 6>, e). 

5. Recursively call min-max-boundary-partition(C' 2 , e). 

6. If more than k subgraphs were generated, repeatly merge the two 
smallest subgraphs until only k subgraphs remain. 

Note that the partitioning procedure of Step 2 evenly divides the boundary- 
vertices and approximately divides the vertex set. We now prove our main sepa- 
rator theorem. Proof: The recursive procedure above defines a separator tree T , 
The size of the subgraph at a leaf is at least (1 — e)\G\/2k but at most 
The graph associated to the root of the separator tree is G itself. Let the level of 
a node in the tree be its distance to the root. Let E be a constant such that every 
graph H in Q{oi) has a separator of cut size at most E\H\^. We now prove, by 
induction on the levels of the separator tree, that there is a constant c such that 
for every node v of dv{G^) < The claim is true for the two children 

of the root, provided c > since we can find a bisection of G of size at most 
Assume that the claim is true for every internal node v at level i — 1. 
Let u and w be the two children of v. The algorithm divides G^ into Gu and 
G^. Let Cl be the constant hidden in the O-notation of Lemma 5. Hence, if G^ 
denotes either Gu or Guj , we have that 

dv{G^) < dyi^Gu ) /2 T ci 
= (c/2 T 

< (2«(c/2 + ci)/(l-er)|Gr. 

The last inequality follows since Lemma 5 insures that > (1 — e)|G^;|/2. 
To conclude, recall that G is a bounded-degree graph and choose c such that 
c > 2"(c/2 + ci)/(l - e)", i.e., 

c- ((1 -e)“/2“ - 1/2) > Cl . 

This can be done as long as e < 1 — 2^“^/". 



□ 
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Corollary 6. Let k be an integer such that k > 1. Then, every n-node well- 
shaped mesh or nearest neighbor graph has a {2, k) -partition F with 
max-boundary^- (/^) = every n-node bounded-degree planar 

graph, graph with bounded genus, and graph with bounded forbidden minor has 
a {2, k) -partition F with max- bound ary^-(/^) = 0{y^n/k). 

4 Partitioning Weighted Graphs 

The following are two examples where partitioning of weighted graphs are needed. 
In adaptive numerical formulation, in order to efficiently achieve a desired so- 
lution accuracy, sophisticated adaptive strategies that vary the solution or dis- 
cretization technique within each finite element are used. For example, the p- 
refinement technique applies a higher order basis function in those elements 
having a rapidly changing solution or a large error. The /^-refinement technique 
involves subdivision of the mesh elements themselves. (The p— and hybrid hp- 
refinement [3] techniques can be used to efficiently find accurate solutions to 
problems in areas such as computational plasticity.) Strategies such as p— and 
/ip- refinement may cause the work to vary at different elements in the domain. 
This variation may be as high as one or two orders of magnitude [3]. 

In iV-body simulations for non-uniformally distributed particles [2,7,18], par- 
ticles will be grouped into clusters based on their geometric location. The inter- 
action between particles in a pair of well-separated clusters will be approximated 
by the interaction between their clusters. The amount of calculations associated 
with some cluster/particle may be much higher than the amount of calculations 
needed in some other cluster/particle. 

Consider a graph where every vertex is assigned a weight that is propor- 
tional to the amount of computation needed at the vertex. Let the total weight 
of a graph be the sum of the weight of its vertices. Rather than partitioning 
the graph into subgraphs of equal vertex size we would now like to partition 
it into subgraphs with “equal” total weight. However, partitioning according to 
weights alone may cause an imbalance in the size of the resulting subgraphs. In 
some applications, this may cause an imbalance on local memory requirements 
since, in general, all vertices need a similar amount of storage even though the 
computational work associated to them may vary. We consider the problem of 
partitioning vertex-weighted graphs into subgraphs and simultaneously balanc- 
ing the total weight and the size of the vertex set of the resulting subgraphs wile 
minimizing the max-boundary. 



4.1 Simultaneous Partition of Vertices and Weights 

Let G = (y, E,w) be a vertex-weighted graph, where w :V ^ 1R+ is a positive 
weight vector. For any subgraph = {V^ ,E^) of G, we denote by w{(G^) or 
w{y^) the total weight of G^ , i.e. w[G^) = wiV^) = w{y), 

A variant of the following lemma was given in Lipton and Tarjan [11]. 
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Lemma 7. Let 0 < A < 1/2. Let G = {V,E) he a bounded-degree graph in G{oi) 
and w : V ^ JR+ be a weight-vector such that w{v) < Xw{G) for all v e V . 
Then, one can find an 0{\G\^) -separator that divides G into two subgraphs Gi = 
{Vi,Ei) andG 2 = {V 2 ,E 2 ) such that w{Gi),w{G 2 ) < {I E X)w{G) /2. 

Lemma 8. Let 0 < e < 1/2 and 0 < A < 1/2. Let G = {V,E) be a bounded- 
degree graph in G{ot) such that iGj is a power of two. Let w : V ^ JR+ be a 
weight-vector such that w[v) < Xw[G) for all v E V . Then, one can find an 
0{\G\^) -separator that divides G into two subgraphs Gi = {Vi^Ei) and G 2 = 
{V 2 ,E 2 ) such that < {lEe)\V\/2 and w{Gi),w{G 2 ) < {1EX)w{G)/2. 

The proof is similar to that of Lemma 5. 

Let k be an integer such that k > 1. Let G = w) be a vertex- weighted 

graph. Let P = {Gi, . . . ^Gk} be a collection of subgraphs Gi = {Vi^Efi of G 
that have disjoint vertex sets. We say that P is a (/?, 4, /^)-partition of G if the 
Vfis cover all of L, and for all i E {1, . . . , A:} it holds that \Vi\ < /?|~|(T|/A:] and 
w{Gi) < 6w{G)/k. 

The following corollary follows by recursively applying Lemma 8. 

Corollary 9. Let 0 < e < 1/2 and 0 < A < 1/2. Let k be an integer such that 
k > 1. Let G = {V,E) be a graph in G{oi) such that \G\ is a power of two. Let 
w :V ^ 1R+ be a weight-vector such that w{v) < Xw{G) for all v eV. Then, a 
(1 + e, 1 + A, k)-partition P of G such that total-boundary£;(P) = 0{k^~^\G\^) 
can be constructed. 



4.2 Min— Max— Boundary Partition of Weighted Graphs 

Theorem 10 . Let k be an integer such that k > 1 and X be a constant such 
that 0 < A < 1/2. Let G = {V^E) be a bounded- degree graph in G{oi) such 
that \G\ is a power of two. Let w : V ^ 1R+ be a weight-vector such that 
w{v) < Xw{G) for all V E V . Then, a (2, 1 + X,k) -partition P of G such that 
max-boundary^- (P) = 0{{\G\/k)'^) can be constructed. 

To prove the latter theorem we will follow the same argument used in Section 3.2 
to prove Theorem 2. The algorithm recursively applies the following lemma to 
simultaneously partition weights, vertices, and boundary. Details will be given 
in the full version. 

Lemma 11. Let 0 < e < 1/2 and 0 < A < 1/2. Let G = (D, P) be a bounded- 
degree graph in G{oi) such that |G| is a power of two and w : V ^ 1R+ be a 
weight-vector such that w{v) < Xw{G) for all v E V . Let S C V be a subset of 
V. Then, one can find on 0{\G\^) -separator that divides G into two subgraphs 
Gi = (Li,Pi) andG 2 = (^ 2 ,^ 2 ) such that IS'nLil = \_\S\l2\, |S'nl/ 2 | = n*S|/2], 
|Li|,|L 2| < (l + e)|G|/2, andw{Gi),w{G2) < (1 + A)u;(G)/2. 

The proof follows the basic idea developed in the proof of Lemma 5. 
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5 Vertex-Based Decomposition 

An alternative way to partition a graph is by removing vertices rather than by 
removing edges. Vertex-based decomposition has been used in nested dissection 
for solving sparse linear systems [12] and overlapping domain decomposition [8]. 
Dipt on, Rose, and Tar j an [12] gave the following scheme to recursively divide a 
graph using vertex separators: 

Algorithm: LRT(G, G) 

1. If 1^1 < 6> then return G, 

2. Find a small vertex separator C C V of G = E) that partitions 
V into two disjoint subsets A and B such that |A| < |F|/2 and 
\B\ < |F|/2. 

3. Let Gi and G 2 be the subgraphs of G induced by the vertex sets 
Ayj G and Byj G respectively. 

4. Recursively call LRT(Gi, 6>). 

5. Recursively call LRT(G 2 , 6>). 

The procedure above decomposes the input graph G into k subgraphs (Ti , . . . , , 

for some k > 1. Vertices used in separators may occur in two or more subgraphs. 

Motivated by this procedure, we define the following vertex-based decompo- 
sition problem. Given a graph G = ( F, A') and an integer /^ > 1 , we say that 
D = {Fi, . . . , F/^} is a {j3^k) -decomposition of F if the subgraphs Gi = [Vi^Ei) 
of G induced by the V’s are such that V = F, and |F^| < 

/?|~|F|//^], for alH G {1, . . . , A:}. Note that in such a decomposition Gi, . . . 
may be pair-wise overlapping. 

In this section, we denote by d[Gi) the set of vertices in V that are also 
nodes of some other subgraph Gj^ j ^ i. As in multi-way graph partitioning, we 
consider the following two costs associated with a (/?, A:)-decomposition: 

k 

total-boundaryy (D) = E |a(c.)| 
i = l 

k k 

I Fa: I = I F| + total-boundary y/ [D) — \ (J ^(G^)| . 

i=i i=i 

Given a graph G = (F, A'), a subset C of F is called a vertex-bisector of G if it 
is a vertex separator of G that partitions V into two disjoint subsets of size at 
most |F|/2. Again, let Q denote a family of graphs (not necessarily of bounded- 
degree) that is closed under the subgraph operation. Let 0 < o; < 1. We say that 
0 has an -vertex-separator theorem or Q is -vertex-separable if there is a 
constant c such that every n-node graph in Q has a vertex-bisector of size cn". 
Moreover, we refer to the latter type of bisectors as n" -vertex-separators. There 
are several families of graphs that have n"-vertex-separators and are closed 
under the subgraph operation. Examples includeplanar graphs [11], graphs with 
bounded genus [10], graphs with no /i-clique minor for a constant h [1], well- 
shaped meshes in IR^ and nearest neighbor graphs in IR^ [13,14,16]. 



146 



Marcos Kiwi et al. 



Below we state two vertex-separator results similar in spirit to those pre- 
sented in Section 3. Their proofs follow from the same type of arguments as 
those developed in Section 3. Thus, we omit the proofs. 

Lemma 12. Let 0 < e < 1/2. Let Q he a family of -vertex-separable graphs 
elosed under the subgraph operation. Let G = (K, E) he a graph in Q sueh that 
1^1 is a power of two. Let S C V be a subset of V . Then^ one can find an 
0{n^) -vertex-separator that divides G into two subgraphs G\ = {Vi,Ei) and 
G 2 = {V 2 ,E 2 ) sueh that |5n Kil = \_\S\I2\, IS'n Ksl = nS'|/2], and |l^iUl^ 2 | < 
(l + e)|C|/2. 

Theorem 13. Let k be an integer sueh that k > 1 and t he a eonstant sueh 
that 0 < e < min{l/2,l — 2^“^/"}. Let Q be a family of -vert ex- separable 
graphs elosed under the subgraph operation. Then, for every graph G = {V, E) 
in 0 a {2, k)-deeomposition D sueh that max-boundaryy (T>) = 0{{\V\/k)^) ean 
be eonstrueted. 

6 Conclusions 

We have conducted experiments on several variations of our algorithm presented 
in this paper. On various finite element meshes in both two and three dimensions, 
the experiments show that the constant of the Big-0 on the boundary size in all 
the separator theorems presented in the paper is less than 1.5. 
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Abstract. The concepts of lowness and highness originate from recur- 
sion theory and were introduced into the complexity theory by Schbning 
[Sch85]. Informally, a set is low (high, resp.) for a relativizable class KL of 
languages if it does not add (adds maximal, resp.) power to KL when used 
as an oracle. In this paper we introduce the notions of boolean lowness 
and boolean highness. Informally, a set is boolean low (boolean high, 
resp.) for a class KL of languages if it does not add (adds maximal, resp.) 
power to KL when combined with KL by boolean operations. We prove 
properties of boolean lowness and boolean highness which show a lot 
of similarities with the notions of lowness and highness. Using Kadin’s 
technique of hard strings (see [Kad88, Wag87, CK96, BC093]) we show 
that the sets which are boolean low for the classes of the boolean hier- 
archy are low for the boolean closure of U|. Furthermore, we prove a 
result on boolean lowness which has as a corollary the best known re- 
sult (see [BC093]; in fact even a bit better) on the connection of the 
collapses of the boolean hierarchy and the polynomial-time hierarchy: If 
BH = NF{k) then PH = Bl{k - i) 0 NP{k). 

Keywords: Computational complexity, lowness, highness, boolean low- 
ness, boolean highness, boolean hierarchy, polynomial- time hierarchy, 
hard/easy, advice, collapse. 



1 Introduction 

The concept of lowness and highness was originally studied in a recursion the- 
oretic context ([Coo74] and [Soa74]). At that time the question arose how to 
measure the content of information of an oracle, used by a Turing machine. An 
oracle was called low for a given class 1C if it does not add power to the machines 
accepting sets from 1C. It was called high for 1C if it adds (in a sense) maximal 
power to 1C. These notions have been studied particularly in the context of the 
arithmetic hierarchy. 

Many ideas and concepts were translated from recursion theory into the terms 
of complexity theory. So Schoning ([Sch83],[Sch85]) introduced the notions of 
lowness and highness into complexity theory. In the context of the polynomial- 
time hierarchy he defined Low^ =^f {A 0 NF\{}J^)^ = and High^ =^f {A 0 
NF\{}J^)^ = 47^^^ } for k > 0. It was shown in [Sch85] that the Lovj^ as well as 
the HighJj^ classes build a (possibly non-proper) hierarchy, that the collapse of the 
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polynomial-time hierarchy is related to some properties of Lovj^ and High^ sets, 
and that the lower classes of these hierarchies can be characterized in well-known 
terms. 

In Section 3 we introduce the notions of boolean lowness and boolean highness. 
Informally, a set is boolean low (boolean high, resp.) for a class 1C of languages 
if it does not add (adds maximal, resp.) power to 1C when combined with 1C by 
boolean operations. We make this precise in the context of the classes NF{k) 
of the boolean hierarchy. Let 7Z^[A) be the class of all languages which are 
<^-reducible to A and let 1Z^{A) =df 1Z^{A) U P . For classes 1C and 1C' let 
further co-lC =df {A\A G /C}, 1C A 1C' =df {A fl B\A G /C, G IC'}^ and 1C\/ 1C' =df 
{A\JB\A elC,B e 1C'}, For A: > 0, a set ^ G iVF is in lowl iff NP{k)AU^{A) = 
NP{k)vn^{A) = NP{k) Aco-U^{A) = NP{k)V co-U^{A) = NP{k)] a set ^ G 
NP is in htghl mNP{k)An^{A) = NP{k)ANP, NP{k)vU^{A) = NP{k)vNP, 
NP{k) Aco-n^{A) = NP{k)Aco-NP, and iVF(A:) Vco-7^^(^) = NP{k)Vco-NP. 
For the classes low^ and high^ we prove results which are very similar to those 
for the classes Low^ and High^ (in the context of the boolean hierarchy rather 
than the polynomial-time hierarchy). 

In Section 4 we relate boolean lowness to lowness. Using Kadin’s technique 
of hard strings (see [Kad88, Wag87, CK96, BC093]) we prove that A G 
implies (U|)"^ C B^(2k — 1) where U|(A:) denotes the A:-th level of the boolean 
hierarchy over Uf . Hence every low^ set is low for the boolean closure of U|, 
and consequently low^ C Low^ for all A: > 0. 

These results have interesting consequences to the connection between the 
collapses of the boolean hierarchy and the polynomial-time hierarchy. Kadin 
[Kad88] showed that BH — NP{k) implies PH — Zi^. This was improved 
in [Wag87] where could be replaced by the boolean closure of and in- 
dependently in [CK96] where was replaced by U|(A:). Eventually, Beigel, 
Chang, and Ogihara [BC093] proved that BH = NP[k) implies the collapse 
of the polynomial-time hierarchy to the class of all languages that are com- 
putable in polynomial time with A: — 1 parallel queries to a set and an 
unbounded number of queries in NP. Adapting their method we prove in Sec- 
tion 5 that low^ n High^ ^ 0 implies PH = A|(A: — 1) 0 for A: > 1, where 

1C O 1C' =df {AAB\A G /C,i^ G 1C'}. From this and a recent result by Chang 
[Cha97] one can conclude: If BH = NP{k) then PH = A|(A: — 1) 0 NP{k)^ 
which even slightly improves the [BC093] result. 



2 Preliminaries 

With P^ [NP^) we denote the class of all languages accepted by a determin- 
istic polynomial time (nondeterministic polynomial time, resp.) oracle machine 
using the oracle B. If the queries to an oracle depend on the answers to previous 
queries, we will call them adaptive queries. The queries are said to be made in 
parallel^ if a list of all queries is calculated before the machine asks the oracle. 
The class of all languages accepted by deterministic polynomial time (nonde- 
terministic polynomial time) oracle machines making only parallel queries to an 
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oracle B will be denoted by [NP^). For a class /C of languages we define 

=« Ube^c =« Ube^c and Pf Ube^c tf • 

For a function r : N i-h^ N we denote by P^[r] the class of languages accepted 
by some deterministic polynomial time oracle machine asking the oracle only 
r(n) times for an input of length n. Similarly let F|^[r] be the class of languages 
accepted by deterministic polynomial oracle machines making only r(n) parallel 
queries to B, 

For classes 1C and of languages we define co-K =df {A\A G /C}, 1C A 1C' =df 
{A nB\AelC,B e 1C'}, ICWlC' {A UB\Ae 1C, B e 1C'}, and 1C® 1C' 
{AAB\A E 1C, B E 1C'}. The boolean hierarchy over a complexity class 1C ® 
P consists of the classes lC[k) and co-lC[k) for k = 0,1, .. . which are defined 
inductively by 1C[0) =df P and lC[k ® 1) =df co-lC{k) A 1C. Furthermore let 

5F(/C)=,,U>oA9- 

With 1C = NP we obtain the well known boolean hierarchy over NP (see [CH86], 
[WW85], [Kob85], [KSW87] and [CGH+88]). For convenience we set BH =df 
BH{NP). 

We will need the following lemma which is proved for 1C = NP in [KSW87]. 
The proofs remain valid in the general case. The equality in statement 5 can be 
found in [Wag97]. 

Lemma 1 Let 1C D P he closed under union and intersection, and let k > 0. 

L K{2k P 1) = K{2k) y 1C 

2. lC{2k P 2) = lC{2k P 1) A co-lC 

3. lC{k P2)= lC{k) V (/C A co-lC) = lC{k) A (/C V co-lC) 

A^(A^) = 1C ® 1C ® ... ® K} 

k times 

5. JC{k) C Pl^[k] = P(BJC{k) CJC{k + 1) 

The polynomial-time hierarchy consists of the classes T’|', 77^, and (9^ by 
defining inductively =jf 77^ =jf =jf 6>^ =jf F and 
-fffc+i =« ^fc+i =« 6>fc+i =" ^^40(logn)]. Obviously Tf U 

77| C BH{Pl) C C Al^^ C n for all k > 0. Finally let 

P77=« U>oA"- 

3 Definitions and properties 

In a general way, one can define the notions of lowness and highness as follows: 

Definition 2 

— For any relativizahle class 1C, the set A is low for 1C if and only iflC^ = 1C. 
Let Low{lC) he the class of all sets which are low for 1C. 

— For any relativizahle class 1C and any class Ad, the set A E Ai is high for 1C 
with respect to Ai if and only if IC^ — IC'^ . 

Let High{lC,Ad) he the class of all sets which are high for 1C with respect 
to Ad. 
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In [Scli85] Schoning introduced the classes Lovj^ =df Low[}J^) Pi NF and 
Htghl mgh{Sl,NP) for A; > 0, where {S^)^ and =« 

. The following facts are known about the classes Low^ and High^. 



Theorem 3 ([Sch85]) For k > 0, 



- PH = El 



Low^, 



NP 



the following are equivalent: 

- Htghl = 

- Lowl n Htghl ^ 0 



Theorem 4 ([Sch85]) For all k 

L Low^ C Lowl^-^ 2. High^, C Highl^^ 

Theorem 5 ([Sch85]) 

1. Lowf. = P 3. High^ = \A\A <& -complete for NP} 

2. Lovf = NPneo-NP y o X \ -T 

Now we want to have notions of boolean lowness and boolean highness, i.e. no- 
tions which are not based on oracle constructions (which build the polynomial- 
time hierarchy) but on the boolean construction which build the boolean hier- 
archy. Let 1Z^[A) be the class of all languages which are <^-reducible to A and 
let 1Z^{A) =df Wf^[A) U P, Note that 1Z^{A) = Wf^[A) \i A ^ $ and A ^ 
and that Wf^[A) = P otherwise. 



Definition 6 

— For any class JC D P^ the set A is boolean low for 1C if and only if 1C A 
TZl^iA) = /C V TZl,{A) = /C A co-TZl,{A) = /C V co-T^^(Al) = /C. 

Let low {1C) he the class of all sets which are boolean low for 1C, 

— For any class 1C D P and any Ad ^ P which is closed under the 
set A E A4 is boolean high for 1C with respect to Ad if and only if 1C A 
n^{A) = 1C A Ad, 1C V n^{A) = 1C V Ad, 1C A co-U^{A) = 1CA co-Ad, and 
1C V co-WP^{A) = 1C V CO-A4, 

Let high{lC^Ad) he the class of all sets which are boolean high for 1C with 
respect to A4, 

The following properties are easy to prove. 

Proposition 7 Let 1C D P and A4 ^ P, For sets A and B such that A B, 

1. If B E low (1C) then A E low (1C), 

2. If A E high{lC^Ad) and B E Ad then B E high{lC^Ad), 

Proposition 8 Let 1C D P he closed under union, intersection, and 

1, low{lC) = /C n co-lC, 

2. If AAyj CO-A4 C 1C then high{lC^A4) = AA. 



On Boolean Lowness and Boolean Highness 



151 



Consequently, boolean lowness and boolean highness are interesting mainly for 
classes 1C which are not closed under union and intersection. The classes of the 
boolean hierarchy (besides the levels 0 and 1) most likely have this properties. 
For them we define special classes low^ and high^ as analogues for the classes 
Lovj^ and High^ for the polynomial-time hierarchy. 

Definition 9 For k > 0^ 

L lowl =df low{NP{k)) n NF 
2, htghl=^, htgh{NP{k),NF) 

Boolean lowness and boolean highness for the class NP{k) are strongly con- 
nected with collapse properties for the boolean hierarchy. 



Theorem 10 For k > 0^ the following are equivalent: 

- BH = NP{k) - htghl = NP 

- lowl = ^ ^ 0 

The next result exhibits the nature of the classes /ou?q, low^ and high^. 



Theorem 11 



L Iowq = P 

2, lowl = n co-NP 

( {A\A is <^~ complete for NP} if P ^ NP 
\ iVP tf P = NP 



3, highl 



Thus these classes are similar to the corresponding Low^ and High^ classes. 
More precisely: IoWq = Lou?q, low^ = Fowl and high^ C High^. The next 
lemma demonstrates, that the inclusion structure of the low^ classes {high^ 
classes, resp.) is similar to that of Low^ classes [High^ classes, resp.). 



Theorem 12 For all A: > 0^ 
L lowl - 
2. highl C high^^2 



3. highly C high^j^j^^ 



Proof: Use Lemma 1 to decompose NP{k + i ), NP{k + ^), and NP{2k T i ) in 
such a way that the assumption can be used. □ 



Unfortunately, we are not able to prove high^j^^-^ C highl^^^^^ even the easiest 
case highl — highl- This is equivalent with proving that NP A co-lFP^[A) = 
NPAco-NP and N P V co-7Zl^{A) = NPV co-NP implies [NP Aco-NP)\/lZl^{A) = 
{NP A co-NP) V NP and {NP A co-NP) V co-UP^{A) = {NP A co-NP) V co-NP. 
By Theorem 11 it is obvious that the classes IoWq and lowl closed under 
complement. The next theorem shows the consequences of other lowl highl 
classes to be closed under complement. The behavior of lowl highl ^^^^ses 
seems to differ from the one of LovjI and Highl classes, which are obviously 
closed under complement. 



Theorem 13 

1. For k > 2^ ^^'^k ^ co-lowl ^ lowl ^ ^ co-NP 

2. For k>t)^ highl ^ co-highl ^ NP = co-NP. 
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4 low^ sets aire low for 

In this section we show, that our low^ sets are low for and hence in 

Low^. To prove this, we use similar hard and easy arguments as introduced in 
[Kad88] (see also: [BC093], [CK96], [BF96], [Cha97], [HHH97a] and [HHH97b]). 
The projection function of the to the component of a tuple will be 
denoted by (xi, . . . The projection function of the 

component will be denoted by the shortcut (xi, . . . , =^f (xi , . . . ^Xk) 

For every set A and /^ > 0 we define the set Ak by 

^0 =df ^ 

Ak+I =df {(^1, • • •,Xk+2)\{xi, • • • e SATk hXk+2 e A} {k> 0) 



Lemma 14 For k >0 and A G NP\{0^ ^ then Ak <m SATj^, 

Proof: Define NF\2k) =df NF{2k) and NF\2k F 1) =df co-NF{2k F 1) for 
k >0. We observe that NP\k F 1) = co-NP\k) A co-NP. Hence SATj^ is <^~ 
complete for NP\k + 1) for /^ > 0, and Ak G co-NP\k) A co-lZ^[A) for k > 1. 
Using these facts and A G we conclude Ak G co-AP^(/^ + l) Aco-77^(A) = 

co-NP\k + 1) = co-nP^{SATk) = 77^(SATT). □ 

Similar to [CK96] we define the notion of hard sequences. Note that a hard se- 
quence wrt. h for length rn there corresponds to a hard sequence wrt. (SAT, A, rn) 
here. 



Definition 15 

— Let k > 1, rn > 1; j = 1, . . . , A — 1, A C F*, and h : (U*)^ ^ (U*)^. lUe 
call (xi, . . . ,Xj) a hard sequence wrt. (A, h^rn) ijf j = 0 or 

F l<j<k-l, 

2. \xj \ < rn^ 

3. Xj £ A if j = 1 and Xj G SAT ifj > U 

4- Vyi,...,yfe-j e . . ,yk-j,Xj,. . . e SAT), 

5. (xi, . . . ^Xj-i) is a hard sequence wrt. (A, A, rn). 

— We call j order of the hard sequence (xi, . . . , Xj). 

— If there exists no hard sequence wrt. (A, A, m) of order greater than j then ev- 
ery hard sequence wrt. (A, A, rn) of order j is called a maximal hard sequence 
vjrt. (A, A, m) . 

For out main results (Theorem 20 and Corollary 21) we need some techni- 
cal lemmas whose proofs follows the ideas developed in [Kad88], [CK96] and 
[BC093]. Because of the page restriction we omit these proofs here. 

Let A > 1 and A G low^. By Lemma 14 we get Ak-i SATj^_i. The 
next lemma shows, that under this assumption one can reduce also Aj^_j_i to 
SAT^;-j-i by using a hard sequence of order j . 
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Lemma 16 Let A G NF\{0^ U }^k>2^m>l^j = l^...^k — l^ and Ak-i 
SATfc_i via h G FP . If (xi, . . . ^Xj) is a hard sequence wri, then for 

all , . . . , Vk—j ^ F 

{Vl ? • • • ? Vk—j ) ^ S h{y\ , . . . , Vk—j ? ^ jf ? • • • ? ) (^1 ,k—j) ^ jf — 1 • 

The next lemma shows, that we can use maximal hard sequences to reduce H to 

SAT 

Lemma 17 Let A G NF\{0, k > 1, and Ak-i SATj^_i via h G FF. 
There exist a set B G NP and a polynomial r such that for every n; // (xi , . . . , Xj ) 

71 

is a maximal hard sequence wrt, (A,/i,r(n)) then for all y G 2J~ : 
yeA^ {y,l^,xi,...,Xj) e B 

Lemma 18 Let A G NF, k > 1, and Ak-i <m SATj^_i via h G FP. For every 
set L G NF^ there exists a set C G NP and a polynomial s such that for every n: 

YIj 

If (xi, . . . , Xj) is a. maximal hard sequence wrt. (A, /i, s{n)) then for all w G N~ : 
w E L (tc, xi, . . . , Xj) G C 

Lemma 19 Let A G NP, k > 1, and Ak-i <m SATj^_i via h G FP. For every 
set L G there exists a set D G Nf (ind a polynomial t such that for every 

n: If j G {0, 1, . . . , — 1} is the order of a maximal hard sequence wrt. (A, /i, t(n)) 

then for all z £ S~ : 

zeL<^{z,l^J) G D 

If 3 greater than the order of a maximal hard sequence wrt. (A,/i,t(n)) then 
{zA^J)^D. 

Theorem 20 Let k > 1 and let A G low^. 

1. (T|)^CT|(2^-1) 

2, (A|)^CT|(A:-1)0A| 

Proof: Let A G lovj^. By Lemma 14 we get Ak-i <Si SATj^_i via a suitable 
function h G FP. 

For L G we obtain by Lemma 19 a set D G Sf and a polynomial t such 

that for every n: If j G {0, 1, . . . , — 1} is the order of a maximal hard sequence 

71 

wrt. (A,/i,t(n)) then z G L (z, P^, j) G for all z G F~ and (z,P^,j) ^ D 
if j is greater than the order of a maximal hard sequence wrt. (A,/i,t(n)). We 
define the Bf 



Xm 



=df{(l’^0)|3xi, . . . , Xj G B and (P^, xi, . . . , Xj) is a hard 
sequence wrt. (A, /i, t(n))}, 
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and we obtain: 

z e L o(((z, e D)v e T)))® 

1) e T) © (((^, i'"l, 1) e a) V ((i*d"l),2) e :^')))© 



fc - 2) e T) © (((z, l'"U - 2) e a) V fc - 1) e :^')))© 

1) e 7')©((^,i'^U- 1) e D) 

Consequently, L G 0 0 ... 0 = 0'f (2/^ — 1) (Lemma 1.4). 

" V ^ 

2k-l 

For L £ let M be a deterministic polynomial time machine accepting 

L with oracle U G NP"^ and running time bounded by a polynomial t. By 
Lemma 14 we have Ak-i via a suitable function h G FP. By 

Lemma 18 there exists a set C G NP and a polynomial s such that for every 
n: If (xi, . . . ,Xj) is a maximal hard sequence wrt. {A^h^ ^('^)) then w E L' <=> 

(tc, F^, xi, . . . , Xj) G C for all w G 0'“^. Hence there a set E G P^^ such 
that for every n: If (xi, . . . ,Xj) is a maximal hard sequence wrt. (H, /i, s(t(n))) 

then z G L (z, xi, . . . , Xj) G E for all z G . To use the mind- 
change technique we consider on the set a partial order □ defined by 

(xi, . . . ,x^) □ (yi, . . .,yj) iff {% < j and x/ = y/ for / = 1, . . . ,i). Define 

77 , 

F =df ^ ^ ' ' ' 3~(“ □ . _ □ ~ A ~ 

is hard wrt. (H, /i, s(t(n))) A 

ce{z, 1*©)) 7^ ce{z, “) 7^ ■ ■ V cb( 2;, l*©txj))}. 
Obviously F' G T'f , and we obtain 

z e (z, 1*©)) e r’©(z,i*©\i) e ©©•••©(z,!*©^ - i) e F 

77 , 

for all z G . Hence L G Zi|00'| 0 0 . . . 0 — l)0Zi| (Lemma 

' V ^ 

k-1 

1.4). □ 

Now we are able to relate boolean lowness to classical lowness. 

Corollary 21 For k >0^ A ^ 5iL(N|)"^ = 5Ff(27|); i.e. low^ C 

Low{BH{El)), 

Proof: By Theorem 20 and Lemma 1.5 we conclude (N|(m))"^ = (N|)"^(m) C 

yP 

(N| [2k — l))(m) C P, ^ [{2k — 1) • m] C {2km — m + 1). □ 



Corollary 22 Iovjq = Lovjq C Iouj^ = Low\ C I0W2 C - - - C Low{BH{S2)) C 
Low^. 
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5 Consequences to collapse results 

From Theorem 20.1 we get immediately a statement on the connection between 
the collapses of the boolean hierarchy and the polynomial-time hierarchy. 

Corollary 23 For k > if FH = NF{k) then FH = 27|(2/^ — 1)* 

Proof: From BH = NF{k) we get SAT G low^ by Theorem 10 and = 
C E^{2k - 1) by Theorem 20.1. Hence FH = E^{2k - 1). □ 

This improves K ad in’s original result from [Kad88] but it is not as good as the 
result BH = NF{k) ^ FH = E 2 {k) from [CK96] or the further improvement 
in [BC093]. It is obvious that an improvement of Theorem 20.1 by replacing 
Eq {2k — 1) by a smaller class 1C yields the improvement BH = NF{k) ^ FH = 
JC of Corollary 23. However, to follow the idea from [BC093] to improve Theorem 
20.1 we have to combine Theorem 20.1 and Theorem 20.2. But this cannot be 
done without the additional assumption that A is not “too easy” . 

Lemma 24 For k > if A G lovj^ Pi High^ then C E 2 {k — 1) 0 

Proof: By Theorem 20 we obtain {Eq)^ C 27|(2/^— 1) and C — 1)0 

Since A G High^ we can conclude C F^^ = F^^^ C 27|(A:— 1) 0Zi|. 

□ 



Theorem 25 For k > if lovj^ fl High^ ^ 0 then FH — E 2 (A: — 1) 0 Zi| . 

Proof: For A G low^ Pi High^ we obtain {E^)^ C ^^{k — 1) 0 by Lemma 

24 and = E§ because of High^ C High^. Hence = E^{k — 1) 0 

and consequently FH = E 2 {k — 1) 0 Zi|. □ 

Now we get as a corollary the best known result on the connection between the 
collapses of the boolean hierarchy and the polynomial-time hierarchy. This can 
be found implicitly in [BC093]. 

Corollary 26 For k>l, if BH = NF{k) then FH = El{k - 1) 0 NF{k), 

Proof: From BH = NF{k) we get SAT G low^ by Theorem 10. Since SAT G 
High^ we obtain FH — E 2 {k — 1) 0 Zi| by Theorem 25. However, it is known 
from [Cha97] that BH = NF{k) implies Zi| = NF{k), □ 

Since all the above results are relativizable we have also^ 

Corollary 27 For ra,k >1, if BH{E^^^) = E^^^{k) then FH = Ef^_^^{k — i) 0 

mo- 



^ After completion of this work the authors were informed by Lane A. Hemaspaan- 
dra about the fact that this result was independently obtained by Hemaspaandra, 
Hemaspaandra, and Hempel [HHH98]. 
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Abstract. Cronauer et al. [2] introduced the chain method to separate 
counting classes by oracles. Among the classes, for which this method 
is applicable, are NP, coNP, MOD- classes, . . . As these counting classes 
are defined via subsets of , it is natural to ask for the minimum value 
of /c, such that a given class can be defined via such a set. We call this 
value the inherent dimension of the respective class. 

The inherent dimension is a very natural concept, but it is quite hard to 
check the value for a given bounded counting class. Thus, we complement 
this notion by the notion of type-3 dimension, which is less natural than 
inherent dimension, but very easy to check. We compare type-3 dimen- 
sion and inherent dimension, with the result that for classes of inherent 
dimension less than 3, both notions coincide, and generally the inherent 
dimension is never greater than the type-3 dimension. 

For k < 2 we can completely solve the questions, whether a given class 
has inherent dimension /c, and which are the minimal classes with that 
dimension. For k > 3 we give a sufficient condition for a class being 
of dimension at least k. We disprove the conjecture that this is also a 
necessary condition by a counterexample. 



1 Introduction 

The term “Counting Classes” has been used for a lot of complexity classes which 
have some counting process inherent in their definition; for an overview see [4]. 
These classes can be defined by putting certain restrictions on the outcomes of 
functions, where denotes Valiant’s basic counting class [12]. The follow- 
ing general framework has been introduced in [2]: Let U be a (finite or infinite) 
set of integers. Then the class (U)P is the class of all languages L for which there 
exists a function / G #P such that x £ L f[x) G U. In other words: L 

is a language from (U)P if there is a nondeterministic Turing machine M which 
has a number of accepting paths from the set V exactly for inputs which are 
words in L. (See also [3].) In general, we will consider sets V of vectors of natural 
numbers and vectors of functions; this corresponds to Turing machines as 
follows: Given some input and the according computation of our machine, for 
some alphabet V, we count the number of outputs of all symbols from U in M ’s 
accepting computations. An input is accepted iff the vector of numbers formed 
in this way belongs to U. 



Wen-Lian Hsu, Ming- Yang Kao (Eds.): COCOON’98, LNCS 1449, pp. 157-166, 1998. 
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Investigating complexity classes in the area between P and P SPACE always 
leads to the difficulty that one cannot hope to find (unconditional) separations, 
due to the fact that all this area might still collapse. Thus, one of the main 
techniques to develop relations among such complexity classes is the oracle sep- 
aration technique. We make this precise by calling two classes equal, if and only 
if they are equal under all relativizations, and the same for inclusion. In this 
terminology, it has been shown in [2] for instance that all bounded counting 
classes are not closed under complement. 

The starting point for these results was given in 1992 by Bovet, Crescenzi, 
and Silvestri [1], who presented a uniform way to define complexity classes, which 
nowadays is referred to as the leaf language approach^ see also [8,9,10] and the 
recent textbook [11], and a general and sufficient criterion for two classes defined 
in such a way to be separable by an oracle was given. A similar result has been 
obtained independently by Vereshchagin [13]. 

Though these results considerably simplified questions whether oracles with 
certain properties exist by reducing them to combinatorial questions, the argu- 
mentations still were sometimes a bit clumsy. In [5], however, a technique was 
developed, by which the criterion from [1] is much easier applicable in the case 
of bounded counting classes. In [6], things were pushed a bit further and an al- 
gorithm was presented, that given two explicit bounded counting classes decides 
whether they can be separated by an oracle, or not. However, this algorithm has 
the drawback that it does not give explicit answers when the classes are given 
in a parameterized form. 

Finally in [2], using a refinement of the main result from [6], two very easily 
applicable methods, called the First and Second Chain Method, were developed, 
which allow for a lot of bounded counting classes, to decide whether they can 
be separated by an oracle or not. By these methods, for example the question 
which relativizable inclusions between classes of the Boolean Hierarchy over NP 
and other bounded counting classes exist, was completely resolved. 

The current paper builds on these techniques and investigates the question, 
given a subset U of N^, what is the minimum value of kf such that a subset 
V of exists, for which (U)P C (U)P (relative to all oracles). We call this 
value the inherent dimension of the complexity class (C)P. We introduce another 
characteristic of (C)P, which we call the type-3 dimension. We show that the 
type-3 dimension never can be greater than the inherent dimension, and that the 
type-3 dimension is equal to the inherent dimension, if the inherent dimension 
is less than 3. Using this result, we show that there are exactly two minimal 
elements in the set of all classes of (inherent) dimension at least 1, namely NP 
and coNP (and these two classes are incomparable). Similarly, we exhibit four 
explicit classes (NP(2), coNP(2), NP A co-l-NP, coNP V 1-NP), which are all 
minimal elements in the set of all classes of (inherent) dimension at least 2. 

The technique generalizes to give 2^ minimal elements in the set of all classes 
of type-3 dimension at least d for d> 2. However, finally we show that this does 
not say much about inherent dimensions, by giving an example of a class, which 
is of inherent dimension 3, but of type-3 dimension 2. 
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2 Preliminaries 

2.1 Notations 

We assume familiarity with basic concepts of complexity theory. We will use the 
following notations for complexity classes: 

— The classes P, NP, coNP, 1-NP, ^P, and MOD-classes should be well known. 

— If C is a complexity class, then the class co-C is defined as the class of all 
languages L, such that L G C. 

— If Cl and C 2 are two complexity classes, then the class Ci AC 2 is defined as the 
class of all languages L, such that there are languages Li G Ci and L2 G C2 
satisfying L — L\ C\ L 2 > Ci W C 2 is analogously defined via L = Li U L 2 - 

— The classes of the Boolean Hierarchy over a class C are the classes, which 
can be obtained by iterated application of the operators A, V, and co- to the 
class C, e.g. C A co-C or (C A co-C) V C. 

— The classes NP(/^) and coNP(/^) are the classes of the Boolean Hierarchy 
over NP. 



2.2 Complexity Classes Defined by Counting 

Many well-known complexity classes can be defined by counting as done in the 
following definition. 

Definition 1. For P C the com.plexity class (P)P is defined as the class of 
all languages L such that there exist functions /i, . . . ,/a; G #P satisfying 

xgL (/i(^), • • • , A(^)) e 

Using the class (fiU relative to oracle A) instead of we obtain the 

relativized complexity class (P)P"^. 

If there is a bound m G N, such that u G P if and only if min(u, m) G P (the 
minimum taken componentwise), then we call (P)P a bounded counting class. 

2.3 Relativized Equality and Inclusion 

It should be emphasized once again that, whenever we speak of inclusion (or 
equality) of classes, it means inclusion (or equality) under all relativizations. 

Accordingly, whenever we speak of inequality (or noninclusion) of classes, we 
mean that there is an oracle, which makes the classes different. 

3 The First and Second Chain Theorem 



We first recall the chain theorems and the definitions needed for them from [2]: 
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Definition 2. A sequence • • • , G is an alternating chain of length I 

with respect to V ^ if Vi < Vi^± and Vi E V ^ ^ (1 < i < 1). We say 

that this chain has positive signature i;i G otherwise^ we say that it has 
negative signature. 

Theorem 1 (First Chain Theorem). Let U C and U be of hounded 

significance^ and suppose that (U)P C (U)P (in all relativizations). If there is 
an alternating chain with respect to V ^ then there is an alternating chain of the 
same length and the same signature with respect to U , 

Definition 3. Let U C he of hounded significance with hound m. We say^ a 
sequence ui, . . . , is a type-2 alternating chain with respect to V of length s, if 

— Vi < • • • < Vg < (m, m, . . . , m)^ 

k times 

— ^ Y ^ V for i = P and 

— if Vi and Vi^i differ in the j-th component^ then the j-th component of Vi^i 
is equal to m (x = 1, . . . , s — P j = 1, . . . , 

If '^1 belongs to then we say that this chain has positive signature, otherwise 
it has negative signature. 

Theorem 2 (Second Chain Theorem). Let V C and U C he of 

hounded significance^ and suppose that (U)P C (U)P (in all relativizations) . If 
there is a type- 2 alternating chain w.r.t, U, then there exists a type- 2 alternating 
chain of the same length and the same signature w.r.t, U , 

Proposition 1. If V C is of hounded significance with hound then the 
length of every alternating chain w.r.t, V is hounded hy rnk + P and the length 
of every type-2 alternating chain w.r.t, V is hounded hy k 

4 Type-3 Dimension 

In [2] the chain theorems served to obtain a lot of separation results between 
classes from different hierarchies. Looking at these theorems with the aim to 
define “the dimension” of bounded counting classes it seems evident that the 
ordinary alternating chains from the first chain theorem do not help, as there 
can be arbitrarily long alternating chains in one-dimensional sets V (i.e. U C N). 
We leave the obvious proof of this fact as an exercise. 

How well are type-2 chains suited to serve as basis for a definition of dimen- 
sion? In [2] a series of rather complex classes was shown to have only type-2 
chains of length at most 2, namely the classes \/j^ 1-NP and co-\Jj^ 1-NP. It is 
easy to see that these classes can not be represented in the form (U)P for any 
V C N^-P So again we conclude that the existence of type-2 chains does not 
reflect the intuitive idea of “dimension” very well. 

So neither ordinary alternating chains nor type-2 alternating chains should 
be used to define the dimension of a bounded counting class. The following 
definition looks more reasonable: 
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Definition 4. A type- 3 alternating chain of length I with respect to C C 
(where V is of hounded significance with hound rn) is a sequence of pairs 

{v1,v[),{v2,V2),. . . 

where all Vi and v[ are elements from with components at most rn^ such 
that for all i G {1, 1} vjc have vi < v[ < further for all 

i E {1, . . . , VJC have Vi E V ^ V ^ and if vi and v[ differ in the j-th 

component^ then v[ has value rn in the j-th component. The signature of a type- 3 
alternating chain is a sequence t E {+, ^ where T = +; if v[ eV 

and ti = — y if v[ ^ V . 

Note that unlike the situation for ordinary or type-2 chains the signature of 
a longest type-3 alternating chain is not uniquely determined. 

Proposition 2. Let V C he of hounded significance. Then any type-3 alter- 
nating chain w.r.t. V has length at most k. 

Proof. Just observe that in any pair [vi^v^-) of the chain at least one component 
has to change from less than rn to m, where rn is the bound of V. Together with 
the monotonicity of the entire chain, we obtain that after I pairs the vector v[ 
would necessarily equal (m, . . . ,m), so a vector cannot exist. 



4.1 The Third Chain Theorem 

Theorem 3 (Third Chain Theorem). Let C C C C he of hounded 
significance and let (Lf)P C (y)P (relativizahly). If there is a type-3 alternating 
chain of length I with signature t w.r.t. U ^ then there is a type- 3 alternating 
chain of length I with signature t w.r.t. V . 

Proof. We build on a theorem from [6], where it was shown that under our 
assumptions a monotone mapping / from to must exist, which can in 
each component be written as a linear combination of multinomial coefficients, 
such that u E U f[u) E V. Especially, this means that the mapping / is 

“superlinear” in the sense that for all w, i; G with u <v and for every r G N 

f{uTr • {v- u)) > f{u) + r • {f{v) - f{u)). 

Now, in let any type- 3 alternating chain w.r.t. U be given. Let rn be the 
bound of U ^ and let rn^ be the bound of V. We may increase the bound of U to 
rn • {rn' P 1). In our type-3 chain, we now have to change all components of value 
rn to value rn • {rn' + 1). This does not change the property of being a type-3 
alternating chain, but now the difference between vi and v'^ in all pairs from this 
chain has increased at least by a factor of rn' . Thus, because of the superlinearity 
of /, under / all these vectors are mapped in such a way that whenever f{vi) and 
f{v'f) differ in the j-th component, then the j-th component of f{v'f is at least 
rn' . Replacing all vectors f{vi) and f{v'-) by max(/(i;^), m^) and max(/(i;^), m^), 
respectively, results in a type- 3 alternating chain with respect to V of the same 
length and signature as our original chain with respect to U . 
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4.2 Definition of Type-3 Dimension 

The third chain theorem motivates our definition of a type-3 dimension: 

Definition 5. Let C be a hounded counting class^ C = (V^)P for U C with 
hounded significance. Let I he the length of a longest type- 3 alternating chain 
w.r.t. V . Then I is called the type-3 dimension of C. 

Lemma 1. The type- 3 dimension of a hounded counting class C is well defined ^ 
i.e. it does not depend on the choice of the set V with C = (P)P* 

Proof. We have to show that if C = (f^)P = (P)P (in all relativizations), then 
the longest type-3 alternating chains in U and in V are equally long. Because 
of symmetry, it suffices to show that if (U)P C (U)P, then the longest type-3 
alternating chain w.r.t. (J is not longer than the longest type-3 alternating chain 
w.r.t. U. But the latter is a direct consequence of Theorem 3. 

We want to compute the type- 3 dimension for all classes from the Boolean 
Hierarchies over NP and 1-NP. Let for k > 



A/e 


= {(ni, . . 


■ ,rik) 


1 > 0} is odd} 


Dk 


= {(ni, . . 


• , rik) 


1 ttP 


\ rii = 1} is even} 


Bk 


= {(ni, . . 


■ ,rik) 


1 > 0} is even} 


Bk 


= {{ni, . . 


• ,rik) 


1 ttP 


= 1} > 0} 


Ck 


= {(ni, . . 


■ ,rik) 


\ Hi = 1} is odd} 


Fk 


= {{ni, . . 


• ,rik) 


1 ttP 


Im = 1} = 0}. 



Then (H^)P = NP(^), (i^^)P = co-NP(^), (C^)P = l-NP(^), (Vk)P = 
co-l-NP(/^), (Tk)P = 1-NP, (Lf)P = CO- 1-NP. (Readers, who do not 

know some of the classes named here, may take the above vector sets as their 
definitions.) Clearly, in all cases the longest type-3 alternating chains have length 
k. Thus we obtain 

Theorem 4. The classes NP(/^)^ coNP{k)^ 1-NP(/^)^ co-l-NP(/^)^ 1-NP^ and 

CO- Va: 1-NP have type-3 dimension k. 

4.3 Minimal Elements 

In this subsection we want to exhibit minimal classes of a certain type-3 dimen- 
sion. Note that there are no non-trivial classes of type-3 dimension 0. Thus we 
first investigate the case of type-3 dimension 1. 

Theorem 5. The classes NP and coNP have type-3 dimension 1. They are in- 
cornparahle^ and for every class C of type- 3 dimension greater or equal to we 
have either NP C C or coNP C C. 

See the full version [7] for a proof of Theorem 5. 

Theorem 6. The classes NP(2)^ coNP(2)^ NP A co-l-NP^ and coNP V 1-NP 
have type-3 dimension 2. They are pairwise incomparable^ and for every class C 
of type- 3 dimension greater or equal to 2^ we have that at least one of the given 
four classes is contained in C. 
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Proof, We describe these four classes by subsets Vi of with bound 2, re- 
spectively, s.t. (Vi)P = NP(2), (V 2 )P = coNP(2), (V 3 )P = NP Aco-l-NP, and 
(P 4 )P = coNP Vl-NP. 



Vi : 2 

1 
0 


1 1 1 
1 1 1 
+ 1 1 


V 2 : 2 
1 
0 


1 + + 


V 3 : 2 
1 
0 


+ 1 1 
1 1 1 
+ 1 1 


V 4 : 2 
1 
0 


- + - 

+ + + 
+ + + 




0 12 


0 12 


0 12 


0 12 



It is an easy exercise to show that these sets describe the four classes as 
desired, and that by the third chain theorem, they are incomparable. 

Now let C be any bounded counting class of type-3 dimension 2. Let (i;i, u^), 
(^ 2 ,^ 2 ) ^ type-3 chain w.r.t. C C N^, where C = (L)P. We may w.l.o.g. as- 

sume that V 2 = (m, . . . , m), where rn is the bound of U , 

Let this chain be of signature t. We claim that 

1) t = +- ^ (W)PCC 3) t = ++ ^ (V3)PCC 

2) t=-+ ^ (P2)PCC 4) t= — ^ [v^)PQC 

Only cases 1) and 3) have to be proven, because the other two cases follow 

by complementation. We give only the proof for case 1) now. Case 3) is similar, 
though a little more complicated. 

In case 1) we have v\ ^ v[ E D, V 2 E V 2 ^ U. We show that (Li)P O 
(t/)P by providing a map / of admissible form, which maps to in such a 
way that (x,y) G V± if and only if f[x^y) G U. The map is 

f{x,y) = vi+ ■ {v[ - vi)+x-V 2 . 

We have to prove that (x,y) G Vi if and only if f[x^y) G U . Let x = 0 and 
y >2. Then {x^y) G Li. We obtain f{x^y) = T ( 2 ) * (^i ~ which is in L, 

if and only if T 1 * {v[ — v±) is in U. But the latter equals so it definitely 

is in L. 

Now, let X = 0 and y <2. Then we obtain f[x^y) = which is not in U, 
Finally, let x > 0. Then f[x^y) > V 2 = (m,...,m), and thus f[x^y) G 
U V 2 G U ^ but by assumption we have v '2 ^ U ^ and so f{x^y) ^ U, 

In all cases we obtained f[x^y) G U (^,Z/) C Fi, so / in fact shows 

that NP(2) = (Fi)P C (L)P = C. 

Theorems 5 and 6 can be generalized as follows: 

Theorem 7. There are 2 ^ classes of type- 3 dimension which are pairwise 
incomparable^ and for every class C of type- 3 dimension greater or equal to d^ 
we have that at least one of these 2^ classes is contained in C, 

These classes may he constructed in such a way that for every string t G 
{T,— exactly one of these classes has a type-3 alternating chain of length d 
and signature t, 

5 Inherent Dimension 

In Section 4 we introduced a notion of dimension which has the advantage to 
be quite easily applicable, because for every subset D C all longest type-3 
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alternating chains can be constructed by trying all finitely many possibilities. 
In the current section we will introduce another notion, which is the obvious 
notion intuitively associated with the term “dimension”, but with the major 
disadvantage that it is not at all clear, how to compute this dimension for a 
given bounded counting class. 

Definition 6. LetC he a hounded counting class^ and let k he minimal such that 
there is a set U C of hounded significance with C = Then k is called 

the inherent dimension of the class C, 

This definiton together with Proposition 2 yields 

Lemma 2. Let C he a hounded counting class with inherent dimension k and 
type- 3 dimension d. Then k> d. 

Like the type-3 dimension, also the inherent dimension gives us the natural 
value of k for the classes from the Boolean Hierarchies over NP and over 1-NP, 
i.e. for NP(A:), coNP(A:), 1-NP(A:), co-l-NP(A:), \/^ 1-NP, and co- \/^ 1-NP: 

Lemma 3. All of the classes NP(A:)^ coNP(A:)^ 1-NP(A:)^ co-l-NP(A:)^ \/^ 1-NP^ 
and CO- 1-NP have inherent dimension k. 

Proof, The description of these classes by the sets C^y Bky Bkj £^nd Lf in 

Subsection 4.2 shows that their inherent dimension is at most k^ since all these 
sets were subsets of N^. 

But, on the other hand Theorem 4, together with Lemma 2 shows that their 
inherent dimension is at least k. 

This lemma might give some hope that type-3 dimension and inherent di- 
mension coincide in all cases. This would be very convenient, since the inherent 
dimension is the more natural notion, while the type-3 dimension is the one that 
can be checked more easily. In fact, we will show that the two notions coincide 
for all classes of inherent dimension less than 3, but they do not coincide for 
greater dimensions. 

5.1 Classes of Inherent Dimension 1 

Let C be a bounded counting class of inherent dimension 1. We want to show 
that in this case also the type- 3 dimension is 1 . From Lemma 2 we already know 
that the type-3 dimension is at most 1. But it cannot be 0, since only the trivial 
classes (the class containing only the empty set, and the class containing only 
S*) have type-3 dimension 0. 

In fact the converse is also true, but this is not obvious. See the full version 
[7] for a proof. 

Theorem 8. LetC he a hounded counting class, C has inherent dimension T if 
and only if C has type- 3 dimension 1, 

Thus, using Theorem 5, we trivially obtain: 
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Theorem 9. The classes NP and coNP have inherent dimension 1. They are 
incomparable^ and for every class C of inherent dimension greater or equal to 
we have either NP C C or coNP C C, 

5.2 Classes of Inherent Dimension 2 

Similar as Theorem 9, we transfer Theorem 6 to the case of inherent dimension: 

Theorem 10. The classes NP(2)^ coNP(2)^ NP A co-l-NP^ and coNP V 1-NP 
have inherent dimension 2. They are pairwise incomparable^ and for every class 
C of inherent dimension greater or equal to 2^ we have that at least one of the 
given four classes is contained in C, 



5.3 Classes of Inherent Dimension greater than 2 

The conjecture that the minimal classes are the same for inherent dimension and 
for type-3 dimension in every stage is destroyed by the following counterexample: 
We define a class C by a subset C C with bound 3. The definition is given 
by tables showing the (x, y) -project ion of U for the four cases z = 0,...,z = 3: 



z = 0: 3 


- + - + 


2 = 1:3 


+ + + + 


2 = 2:3 


- + - + 


2 = 3:3 


+ + + + 


2 


- + 


2 


+ + + + 


2 


- + 


2 


- + - + 


1 


+ + + + 


1 


+ + + + 


1 


+ + + + 


1 


+ + + + 


0 


- + 


0 


+ + + + 


0 


- + 


0 


- + - + 




0 12 3 


0 12 3 


0 12 3 


0 12 3 



Theorem 11. The bounded counting class C = (^)P with the set U as given 
above has type- 3 dimension 2^ but inherent dimension 3, 

The proof of Theorem 11 can be found in the full paper [7]. 

6 Conclusion 

In this paper, we introduced the notion of dimension for bounded counting 
classes. As these classes can generally be described by a nondeterministic poly- 
nomial time machine with accepting paths of k different kinds (or k different 
outputs)^ and possibly also rejecting paths, one might naturally ask for the min- 
imum k^ such that the given class can be defined using this model. We call this 
number the inherent dimension of the counting class. 

We proved that for small dimensions this dimension can be checked by looking 
for the longest alternating monotone chain with respect to the defining set U , 

In dimensions 1 and 2 we explicitly found the minimal elements in the set of 
bounded counting classes of that dimension, namely NP and coNP for dimension 
1, and NP(2), coNP(2), NP Aco-l-NP, and coNP V 1-NP for dimension 2. 

For dimension at least 3, there are 2^ classes which can be defined in analogy 
to the lower dimension cases by taking a type-3 chain of any given signature 
(thus we obtain 2^ cases) and leaving everything else trivial. However, our last 
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result showed that these classes for dimension 3 do not form a complete set of 
minimal elements; there are other classes, which cannot contain any of these 8 
classes. 

Acknowledgement. I am very grateful to Katja Cronauer for several helpful 
ideas, and to Heribert Vollmer and Klaus Wagner for interesting comments on 
this paper’s subject. 
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Abstract. qAC^\2] is the class of languages computable by circuits of 
constant depth and quasi-polynomial ^ size with unbounded 

fan-in AND, OR, and PARITY gates. Symmetric functions are those 
functions that are invariant under permutations of the input variables. 
Thus a symmetric function fn : {0, 1}^ ^ {0? 1} can also be seen as a 
function fn : {0, 1, • • • ,n} ^ {0, 1}. We give the following characteriza- 
tion of symmetric functions in gA(7°[2], according to how fn(x) changes 
as X grows from 0 to n. A symmetric function / = (/n) is in qAC^[2] if 
and only if fn has period 2*^^^ = log^^^^ n except within both ends of 
length log^^^^ n. 



1 Introduction 

Proving lower bounds is one of the most fundamental tasks in complexity theory. 
However, it appears to be a rather difficult one, and so far people can only show 
lower bounds for very restricted classes, mainly for variants of constant depth 
circuits. 

Let AC^ denote the class of languages computable by constant depth polyno- 
mial size circuits with AND and OR gates. For any constant p G N, by allowing 
AND, OR, and MOD^ gates, we have the class AC^[p\. Allowing a MAJORITY 
gate on the top but only AND and OR gates for the remaining, we get the class 
PERCEPT RON. If a letter q is added before any of the class name above, the 
circuit size is now allowed to be quasi-polynomial, or n. 

The first significant lower bound on the size of such circuits came from Furst, 
Saxe, and Sipser [7] and Ajtai [1], showing that the PARITY function is not in 
AC^ . This was later improved by Yao [12] and Hastad [9], showing that PARITY 
is even outside of qAC^ . Razborov [10] considered gAC^[2] and showed that the 
MAJORITY function is not in it. Smolensky [11] showed that for any prime p 
and any constant c not divisible by p, MOD^ is not in qAC^[p\. Barrington and 
Straubing [3], generalizing the result of Green [8] and Aspnes et aL [2], proved 
that for any constant c, MOD^ is not in gPERCEPTRON. 

Note that all these lower bounds are for symmteric functions. A boolean func- 
tion / : {0, 1}* — {0, 1}, can be seen as a sequence of functions fn : {0, 1}^ — 
{0,1} for n G N, and vice versa. A function fn : {0,1}^ — called 
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symmetric if its value depends merely on the number of I’s in its input, and we 
say that / : {0,1}* — {0?1} symmetric if for all n G N, is symmetric. A 
symmetric function fn can also be seen as a function from [n] = {0, 1, . . . ,n} 
to (0, 1}. We will abuse the notation and also use fn to denote this function 
from [n] to {0,1}. That is, for k G [n], fn{k) = /T^(1^0^“^). It turns out to be 
very useful to look at how fn{^) changes as x grows from 0 to n. The sequence 
/^(0)/^(l) • • • /^(n) is called the weight spectrum of /^. 

Symmetric functions in some circuit classes appear to be special, as there 
are some neat characterizations for them. Fagin, Klawe, Pippenger, and Stock- 
meyer [6] pioneered the study of symmetric functions in AC^ in terms of their 
weight spectra. Brustmann and Wegener [4] followed this approach, equiped 
with Hastad’s lower bound, and found an exact characterization for symmetric 
functions in AC^ . A careful analysis of their proof shows that all the symmetric 
functions in qAC^ are also in AC^ . So we have the following: 

— Suppose that / is a symmetric function. Then / is in AC^ iff / is in qAC^ 
iff fn is constant except within both ends of length log^^^^ n. 

For gPERCEPTRON, Zhang, Barrington, and Tarui [13] gave the following char- 
acterization: 

— A symmetric function / is in gPERCEPTRON iff fn has log^^^^ n many 
value changes. 

Damm and Lenz [5] attempted a characterization of AC^[2] but did not quite 
succeed. Their characterization was based on some unproven assumption. We 
proceed along their line and succeed in characterizing symmetric functions in 
qAC^ [p] for any fixed prime p: 

— A symmetric function / is in qAC^[p] iff fn has period = log^^^^ n 

except within both ends of length log^^^^ n. 

It’s not clear if for symmetric functions gPERCEPTRON would collapse 
to PERCEPTRON, or qAC^[2] would collapse to AC^[2]. However, our result 
implies the following: 

— The set of symmetric functions in qAC^[2] is equal to the set of symmetric 
functions in AC^\2] iff ak G AC^[2] for all k = log^^^^ n. 

2 Preliminaries 

Let sB^ denote the class of symmetric functions from {0, 1}^ to {0, 1}, and let 
sB denote the class of symmetric functions from {0,1}* to {0,1}. sB^ can be 
seen as a vector space of dimension n T 1 over Z 2 , and there are two natural 
bases for it. The first one is{e/^|0<A:<n}, where tj. G sB^ is defined as 




1 if X = k, 

0 otherwise. 



An Exact Characterization of Symmetric Functions in gAC°[2] 169 



The second one is {ak \ 0 < k < n}^ where ak G sB^ is defined as 

ak{x) = mod 2 . 

One can check that aj. is the A:th symmetric polynomial over Z 2 , that is, 

o-k{xi, . . . ,Xn) = ^ lla^i- 

Ie[n],\I\=kieI 

So for k = n, both e/j and aj- are in qAC^[2\. 

Let /„ G sB„. Clearly /„ = Y.ke[n] Let w(/„) denote the weight 

spectrum /^(0)/^(l) • • • /^^(n) G of /^. Define C{fn) to be the smallest 

integer k such that v{fn) is constant except within both ends of length k^ that 
is, the smallest k such that fn{^i • • • is a constant. 

For fn = Y.ke[n] fn{k)(Jk, define the degree of /^, denoted as D(/^), to be 
the largest k with fn{k) 7 ^ 0. The period of /^, denoted as /^(/^), is defined as 
the smallest k such that fn{^) = fn{^-\~k) for 0 < x < n — k. The following 
proposition shows the period of ak- It can be proved using Lucas’ theorem: 

2 ) 

Proposition 1 The period of ak is 2^ where t is the smallest integer such that 
k<2^-l. That ts, F{ak) = ^ 

Corollary 1 Every function in sBn of degree k has period , 

The following can easily be proved using dimension arguments. 

Proposition 2 Every function in sBn with period Tf has degree less than 2T 

We will use a measure for functions in sB^, which is slightly different from 
that used by Damm and Lenz [5]. For fn G sB^, define b[fn) as the smallest 
integer k such that fn G span {e^, Cn-i^ai | 0 < i < /^}, over Z 2 . 

For fn G sB^, it’s useful to divide v[fn) into three parts with a periodic 
middle part. Let v[fn) = ct;/? 7 , where \a\ < \f3\ < jyj < |o;| T 1. Let gn G sB^ 
be the function with the smallest period such that v{gn) = ol ^ for some 
and 7 ^ with |o;^| = \a\ and \Y\ = | 7 |. Define hn = fn^ 9n^ The decomposition 
fn = dn^hn IS Called a standard decomposition. So for a standard decomposition 
fn = 9 n®h„, F{gn),C{hn) < T(n+ l)/3]. 

Let MAJ^ denote the MAJORITY function on n boolean variables, which 
outputs 1 iff at least n /2 input variables are 1 . We will need the following lower 
bound of Smolensky [11]. 

Lemma 1. Eor any fixed, prime p, any depth d circuit with AND^ OR^ and 
MOD'^ gates for MAJn must have size 
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Let MOD^ denote the function on n boolean variables that outputs 1 iff 
the number of Ls in the input is a multiple of q. For any fixed prime q ^ Py 
the above size lower bound also holds for the function MOD^, as proved in 
[ 11 ]. However, we need a slightly stronger lemma instead, which can be proved 
by slightly modifying Smolensky’s argument. Notice the point in the statement 
where q is quantified. 

Lemma 2. Let p he a fixed prime. There exist contants no and such that for 
any n > no and for any prime q p with q < n/ 2 ^ any depth d circuit with 
AND^ OR^ and MOD^ gates computing MOD’f must ha/ae size at least 2 ^^^ 

3 Main Results 

Lemma 3. Suppose that f E sB and for each n G /^ = is a standard 

decompostion. If f is in qAC^[2]^ then P{gn) = log^^^^ n. 

Proof: Let b = which is a function of n. We will construct the function 

MAJfc from /^, and a circuit for MAJ 5 from the circuit for fn without blowing 
up the size too much. So if / is in qAC^[2], then b must be small because of the 
lower bound for MAJ^. 

Consider the interval of length b centered at [n/ 2 j. Clearly fn and gn agree 
at inputs from this interval. Let A = for some integer /, be the 

set of indices i in that interval where gnfi) = 1- The only index x in that interval 
such that H ~ 'I j) = 1 for each ij E A is x = ii^ for otherwise gn has a 

smaller period than b = P(^^). 

Let k = [36/2]. Consider those I functions on k variables derived from fn by 
fixing ij — [6/2] variables to 1 and n—b—ij variables to 0, for I < j < L The AND 
of these I functions has weight spectrum for some a of length [6/2]. By 

negating all its variables we get a function with weight spectrum where 

fl is the reverse of a. More precisely, define 

rXxi---xk)= f\ 

i<^<^ 

Then v[ffi) = 0^“^!/?, for some fi of length [ 6 / 2 ]. Next define 
fX{xi---Xb)= \f /^(xi • • 

Then fif = MAJ^. If fn has a qAC^[2] circuit of depth d and size MAJ 5 

has a depth d-\-2 size 2^*^^''^^ ^ circuit. From Lemma 1, B{gn) = 6 = log^^^^ n. □ 

Lemma 4. Suppose that f E sB and for each n E ^ fn = ^ standard 

decompostion. If f is in qAC^[2]^ than P{gn) 'Is a power of 2 for all hut finitely 
many n. 
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Proof: Suppose / G qAC^[2]. We know that P{gn) = n from the previous 

lemma. Now suppose that P{gn) is not a power of 2 for an infinite number of n. 
We will show that this leads to a contradiction. 

Consider the function derived from fn by discarding both ends of length 
C{hn). That is, for / = n — 2 C(/i^), define 

//(xi ■■■Xi) = /„(xi • • 

Then fj is a periodic function with P(//) = P{gn)' As C(/i^) < |~(n + 

I = n — 2C{hn) = i7(n). So P(/4) = log^^^^ n, and it is not a power of 2 for 
an infinite number of n. Observe that there exist constants no, d, ci, C 2 , C 3 , such 
that when n > no, all the following hold: 

— can be computed by a gAC^[2] circuit of depth d and size 2^°^"^^ as / is 
assumed to be in gAC^ [2 ] . 

- P{fn) < log""" 

- 1 + log^"n 2 ^°s < 2 ^ ^ ^(/4) \ 

— For any prime g ^ p with g < n/2, the function MOD^ is not computable 
by any gAC^[2] circuit of depth d T 1 and size 2 iog ^3 f]-om Lemma 2. 

As P(/^) is not a power of 2 infinitely often, there exist an m > no and 
a prime g ^ 2 such that g \ P(/^) and J > Let b — P(/^), 

r = m — 26, and k = As is a function of period 6 , we can construct 

the function MOD^ and then the function MOD^ as the following: 

MOD^(xi • • -Xr) = f\ /m(^i ' ' *^rL' 0 ^^“\and 
ie[b-l]j;^ (0=1 

MOD^(xi •••Xk)= 

Then MOD^ can be computed by a gAC^\2] circuit of depth d + 1 and size 
1 T 62^*^^''^ ^ < 1 A log^^ ^ < 2 ^*^^ ^ a contradiction. □ 

Lemma 5. Suppose f G sB. If f is in gAC^[2]^ than 6 (/^) = log^^^^ n. 

Proof: Suppose that / is in gAC^[2]. For each n G N, let fn — Pn ^ be a 
standard decompost ion. From Lemmas 3 and 4 we know that P{gn) = log^^^^ n 
and is a power of 2 for almost every n. So from Proposition 2, D{gn) = log^^^^ n. 
Then g = {g^) is in gAC^[2], and so is d = (d^) = (/^ 0 g^). Let 6 = C(d^), 
which is a function of n. We will construct the function MAJ^ from d^, and use 
the lower bound for MAJ^ to upper bound 6 . 

Let v{hn) = vq • • • • • • Vn. Assume without loss of generality 

that Vn-h = 1 (otherwise consider the function d^(xi • • •’x^)). For 1 < i < [ 6 / 2 ], 
by fixing n — 2b-\-i variables to 1 and b — i variables to 0 , we get a function with 
weight spectrum The OR of these functions has weight 

spectrum ^ Qf MAJ 5 . More precisely, define 

hf,{xi ■ ■ ■ Xb) = \J /j„(xi • • 
l<i< [5/2] 
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Then = MAJ 5 . If hn has a depth d size circuit, MAJ 5 has a depth d-\~ 1 

size ^ circuit. From Lemma 1,6 = n. So 6 (/^) = max{D(^^), C{hn) — 

1 } = n. □ 

Fagin, Klawe, Pippenger, and Stockmeyer [ 6 ] showed that for i = log^^^^ n, 
and €n-i are in AC^. Also for i = log^^^^ n, can be computed by a PARITY 
of ^ ANDs. So for / G sB, 6 (/^) = log^^^^ n implies that can be 

computed by the PARITY of some subset of {e^, | i = log^^^^ n}. So we 

have our main theorem and a normal form theorem for sB Pi qAC^[2]. 

Theorem 1 For f G sB^ f G qAC^[2] iff b{fn) = log^^^^ n iff fn has period 
= log^^^^ n except at both ends of length log^^^^ n. 

Theorem 2 Any function in sBAqAC^f2] can he computed by circuits that are 
PARITY of quasi-polynornial number of AC^ circuits. 

There is nothing special about MOD^. In fact for any prime p, we have the 
following similar theorem. The proof is almost identical. 

Theorem 3 For f G sB^ f G qAC^[p] iff fn has period pd^) = log^^^^ n except 
at both ends of length log^^^^ n. 
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Abstract. We continue the study of robust reductions initiated by Gavalda and 
Balcazar. In particular, a 1991 paper of Gavalda and Balcazar [6] claimed an 
optimal separation between the power of robust and nondetermini Stic strong re- 
ductions. Unfortunately, their proof is invalid. We re-establish their theorem. 

Generalizing robust reductions, we note that robustly strong reductions are built 
from two restrictions, robust underproductivity and robust overproductivity, both 
of which have been separately studied before in other contexts. By systematically 
analyzing the power of these reductions, we explore the extent to which each 
restriction weakens the power of reductions. We show that one of these reductions 
yields a new, strong form of the Karp-Lipton Theorem. 

1 Introduction 

Reductions are the key tools used in complexity theory to compare the difficulty of prob- 
lems. Beyond that, reductions play a central role in countless theorems of complexity 
theory, and to understand the power of such theorems we must understand the rela- 
tionships between reductions. For example, Karp and Lipton [11] proved that if SAT 
Turing-reduces to some sparse set then the polynomial hierarchy collapses. A more 
careful analysis reveals that the same result applies under the weaker hypothesis that 
SAT robustly-strong-reduces to some sparse set. In fact, the latter result is simply a rel- 
ativized version of the former result [8], though the first proofs of the latter result were 
direct and quite complex [1,10]. As another example, in the present paper — but not by 
simply asserting relativization — we will note that various theorems, among them the 
Karp-Lipton Theorem, indeed hold for certain reductions that are even more flexible 
than robustly strong reductions. 

In this paper, we continue the investigation of robust reductions started by Gavalda 
and Balcazar [6]. We now briefly mention one way of defining strong reduction [16,14] 
and robustly strong reduction [6]. Definition 1 provides a formal definition of the same 
notions in terms of concepts that are central to this paper. We say that a nondetermin- 
istic Turing machine is a nondeterministic polynomial-time Turing machine (NPTM) if 

* A complete version of this paper, including full proofs, is available as [4]. Research supported 
in part by grants DAAD-315-PRO-fo-ab/NSF-INT-95 13368, NSF-CCR-9057486, NSF-CCR- 
9319093, and NSF-CCR-9322513, and an Alfred P. Sloan Fellowship. 
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there is a polynomial p such that, for each oracle A and for each integer n, the nonde- 
terministic runtime of on inputs of size n is bounded by p{n). (Requiring that the 
polynomial upper-bounds the runtime independent of the oracle is superfluous in the 
definition of but may be a nontrivial restriction in the definition of see the 
discussion of this point in Section 6. The definitions used here agree with those in the 
previous literature.) Consider NPTMs with three possible outcomes on each path: acc, 
rej, and ?. We say A strong-reduces to B, A <|J^ B, if there is an NPTM N such that, 
for every input x, it holds that (a) if x £ A then N^{x) has at least one acc path and no 
rej paths, and (b) ifx^A then N^(x) has at least one rej path and no acc paths. (Note 
that in either case the machine may also have some ? paths.) Furthermore, we say A 
robustly strong-reduces to B, A B, if there is an NPTM N such that A <|J^ B 
via N (in the sense of the above definition) and, moreover, for every oracle O and every 
input X, N^\x) is strong, i.e., it either has at least one acc path and no rej paths, or has 
at least one rej path and no acc paths. This paper is concerned with the relative power of 
these two reductions, and with reductions whose power is intermediate between theirs. 

In particular, it is claimed in [6] that the following strong separation holds with 
respect to the two reductions: For every recursive set A ^ NP D coNP, there is a 
recursive set B such that A strong-reduces to B but A does not robustly strong-reduce 
to B [6, Theorem 11]. Unfortunately, there is a subtle but apparently fatal error in their 
proof. One of the main contributions of this paper is that we re-establish their sweeping 
theorem. Note that the zero degrees of these reducibilities are identical, namely the class 
NP n coNP [6]. Thus, in a certain sense, the above claim of Gavalda and Balcazar is 
optimal (if it is true, as we prove it is), as if A € NP fl coNP then A strong-reduces to 
every B and A also robustly strong-reduces to every B. 

Section 3 re-establishes the above claim of Gavalda and Balcazar. The proof (avail- 
able in the full version of this paper [4]) is delicate, and is carried out in three 
stages: First, we establish the result for all A £ EXP — (NP D coNP), where 
EXP = Uk:>o DTIME[2^^]. Here, the set B produced from the proof is not neces- 
sarily recursive. Second, we remove the restriction of A E EXP, by showing that if the 
result fails for A ^ EXP then indeed A E EXP, yielding a contradiction. The proof so 
far only establishes the existence of some B, which is not necessarily recursive. Finally, 
with the certainty that some B exists, we can recast the proof and show that for every 
recursive A a recursive B can be constructed. 

The notion of “robustly strong” is made up of two components — one stating that for 
all sets and all inputs the reducing machine has at least one non- ? path, and the other 
stating that for all sets and all inputs the reducing machine does not simultaneously have 
acc and rej paths. Each component has been separately studied before in the literature, 
in different contexts. By considering each of these two requirements in conjunction with 
strong reductions, we obtain two natural new reductions whose power falls between that 
of strong reductions and that of robustly strong reductions. Section 4 studies the relative 
power of Turing reductions, of strong reductions, of robustly strong reductions, and of 
our two new reductions. In some cases we establish absolute separations. In other cases, 
we see that the relative computation power is tied to the P = NP question. Curiously, 
the two new reductions are deeply asymmetric in terms of what is currently provable 
about their properties. For one of the new reductions, we show that if it differs from 
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Turing reductions then P 7 ^ NP. For the other, we show that the reduction does differ 
from Turing reductions. In Section 5, we discuss some issues regarding what collapses 
of the polynomial time hierarchy occur if sparse sets exist that are hard or complete for 
NP with respect to the new reductions. One of the new reductions extends the reach of 
hardness results. 

2 Two New Reducibilities 

For each NPTM N and each set D C define outj^o (x) = {y\y G {acc, rej, ?} A 
some computation path of N^{x) has outcome y}. As is standard, for each nondeter- 
ministic machine N and each set D C let L(N^) denote the set of all x for which 
acc G outf^D{x). For each nondeterministic machine N and each set D C let 
Lrej(N^) denote the set of all x for which rej G outj^D (x). A computation N^{x) is 
called underproductive if {acc, rej} ^ outj^D (x). That is, N^{x) does not have as out- 
comes both acc and rej. is said to be underproductive if, for each string x, N^{x) 
is underproductive. That is, L(N^) fi Lrej (N^) = 0. Underproductive machines were 
introduced by Buntrock his 1989 Ph.D. thesis. Allender et al. [2] have shown underpro- 
ductivity to be very useful in the study of almost-e very where complexity hierarchies 
for nondeterministic time classes. A computation N^{x) is called overproductive if 
outi^D{x) 7 ^ {?}. A machine is said to be overproductive if, for each string x, 
N^{x) is overproductive. Equivalently, L{N^) U Lrej{N^) = We say that N is 
robustly overproductive if for each D C 27"^ it holds that is overproductive. We say 
that N is robustly underproductive if for each D C 27"^ it holds that is underpro- 
ductive. <5^ as always has its routine definition. Using underproductivity, overproduc- 
tivity, and robustness, we may now define strong and robustly strong reductions, which 
have been previously studied. We also introduce two intermediate reductions, obtained 
by limiting the robustness to just the overproductivity or the underproductivity. ^ The 
trivial containment relationships are shown in Proposition 1. In this paper we will ask 
whether some of the containments of Proposition 1 might in fact be equalities, and in 
particular we seek necessary conditions and sufficient conditions for such. 

Definition 1. 1. [14], see also [16] (‘'strong reductions”) A B if there is an 

NPTM N such that is overproductive, is underproductive, and A — L{N^). 
2. [6] (“robustly strong reductions”) A B if A B via an NPTM N, and N 

is both robustly overproductive and robustly underproductive. 3. ( “strong and robustly 
underproductive reductions” or, for short, “U -reductions”) A <Jp B if A <|J^ B via 
an NPTM N that is robustly underproductive. 4. (“strong and robustly overproductive 
reductions” or, for short, “0-reductions ”) A <g B if A <1^ B via an NPTM N that 
is robustly overproductive. 

^ The literature contains various notations for strong reductions (also known as strong nondeter- 
ministic reductions). We adopt the notation of Long’s paper [14], i.e., <t^. However, we note 
that some papers use other notations, such as <t\ and ^ For the three other 

reductions we discuss, we replace the SN with a mnemonic abbreviation. For robustly strong 
we follow Gavalda and B alcazar [6] and use RS. For brevity, we use O as our abbreviation for 
our “strong and robustly overproductive” reductions, and we use U as our abbreviation for our 
“strong and robustly underproductive” reductions. 
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Notation 1 For each well-defined reduction <^, let denote {(.4, B) | A B}. 

Proposition 1. C ^ ^ 

Using different terminology, robust underproductivity (though not <¥) has been 
introduced into the literature by Beigel ([3], see also [7]), and the following theorem 
will be of use in the present paper. 

Theorem 2. ([3], see also [7]) If NPTM N is robustly underproductive, then 
(V.4)(3L G C L C L{Ny]. 

Theorem 2 says that if a machine is robustly underproductive, then for every oracle 
there is a relatively simple set that separates its acceptance set from its Lrej set. In 
particular, if P = NP and iV is a robustly underproductive machine, then for every 
oracle A it holds that L{N^) and Lrej{N^) are P^-separable. 

As is standard, we say that a set S is sparse if there is a polynomial r such that, 
for each n, | |S-^| | < r (n). Using different terminology, “robust with respect to sparse 
sets”-overproductivity (though not <^) has been introduced into the literature by Hart- 
manis and Hemachandra [7], and the following theorem will be of use in the present 
paper. 

Theorem 3. [7] If NPTM N is such that for each sparse set S it holds that is 
overproductive, then for every sparse set S there exists a binary predicate b computable 
in such that, for all x, {x | b{x)} C L(iV^) and {x | ^b{x)} C Lrej(N^^), 

where FP denotes the polynomial- time computable functions. 

Theorem 3 says that if a machine is “robustly with respect to sparse oracles”- 
overproductive, then for every sparse oracle there is a relatively simple function that 
for each input correctly declares either that the machine has accepting paths or that the 
machine has rejecting paths. Crescenzi and Silvestri [5] show via Spemer’s Lemma that 
Theorem 3 fails when the sparseness condition is removed, and their proof approach 
will be of use in this paper. 

It is known that SN reductions and RS reductions have nonuniform characteriza- 
tions. In particular, for every reducibility <\ and every class C, let R^(C) = {.4 | (3B £ 
C)[A B]}. Gavalda and Balcazar proved the following result. 

Theorems /dy 7. R|^ (SPARSE) = NP/poly fl coNP/poly. 2. R|^ (SPARSE) = 
(NP n coNP)/poly. 

We note in passing that the downward closures of the sparse sets under our two 
new reductions have analogous characterizations, albeit somewhat stilted ones. We say 
A € NP/poly n coNP/poly via the pair (M, N) of NPTMs if there is a sparse set 
S such that A — L(M^) and A — Hartmanis and Hemachandra [7] de- 

fined robustly -spanning pairs of machines (M, W) to be pairs having the prop- 
erty L(M^) U L(N^) = for every oracle X, and robustly disjoint pairs to be 
pairs having the property L{M^) D L(N^) = 0 for every oracle X. Using these 
notions we note the following characterizations. A € Rg (SPARSE) if and only if 
A £ NP/poly n coNP/poly via some robustly 27 -spanning pair (M, N) of NPTMs. 
A £ RJp (SPARSE) if and only if A £ NP/poly fl coNP/poly via some robustly 
disjoint pair (M, iV) of NPTMs. 
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3 A Strong Separation of <1^ and 

It follows from each of Section 4’s Theorems 8 and 13, both of which have relatively 
simple proofs, that the reducibilities and are distinct. However, more can be 
said. The separation of these two reductions turns out to be extremely strong, namely, 
for every recursive set A ^ NP D coNP, there exists a recursive set B such that A is 
strongly reducible to B but A is not robustly strong reducible to B. This is Theorem 6. 
As noted in Section 1, this claim cannot be generalized to include NP D coNP since 
NP n coNP is the zero degree of as has been pointed out by Gavalda and 

Balcazar [6]. Theorem 6 was first stated in Gavalda and Balcazar’s 1991 paper [6]. 
The diagonalization proof given there correctly establishes A. B, but it fails to 
establish A <|J^ B. The main error is the following: In the proof there is a passage 
where the minimum word x is searched for that witnesses that the machine under con- 
sideration does not strongly reduce Ato B. If such an x is found, then B is augmented 
by some suitably chosen word (triple). Now it is true that such an x must always exist. 
However, it might be huge, and then between this x and the previous one, say x\ no 
coding has been done, i.e., for all 0 between x^ and x, no triple (z^ 0) or {z^ 1) with 

l^l = \y\ has been added to B. Thus, the condition “(i)” of [6, p. 6], which is intended 
to guarantee A B, is violated. 

We state as Theorem 5 a key claim. In our full version of this paper [4] we prove 
that and then build on that to achieve our main result. Theorem 6. 

Theorem 5. (V recursive A ^ NP n coNP)(3B)[.4 <|N BaA 

Theorem 6. (V recursive A ^ NP n coNP) (3 recursive B)[A <|J^ B A A B]. 

More generally, our proof [4] actually shows that a B recursive in A can be found to 
satisfy the theorem. 

One can ask whether the difference of <|J^ and is so strong that the following 
statement holds: (V recursive B ^ NPncoNP)(3 recursive A)[A <|J^ BaA B]. 
This can be reformulated in terms of reducibility downward closures: (V recursive B ^ 
NP n coNP)[R?p^(B) / Rr|^(R)]. However, this claim is false. Intuitively, if B is 
chosen to be sufficientiy complex, the differences between the two reductions may be 
too fine to still be distinguishable in the presence of B. Indeed, if for instance B is an 
EXPSPACE-complete set, and thus is certainly not contained in NP D coNP, then for 
every .4 G = NP® (1 coNP® = EXPSPACE we have A <p B and hence 

A <1® B,i-e.,R|^(B) = EJI^iB). 

4 Comparing the Power of the Reductions 

Long [14] proved that strong and Turing polynomial-time reductions differ. More pre- 
cisely, he proved the following result. 

Theorem 7. [14] (V recursive .4 ^ P)(3 recursive B)[A B A A R]. 

Consequently, at least one of the edges in Figure 1 must represent a strict inclu- 
sion. Indeed, we can show that strong reductions differ from both overproductive and 
underproductive reductions. 
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Theorems. 1. (3 recursive .4) (3 recursive B) [.4 B A 4. B]. Indeed, we 

may even achieve this via a recursive sparse set B and a recursive tally set A. 

2. (3 recursive 4) (3 recursive B) [4 B f\ A B]. Indeed, we may even 

achieve this via a recursive sparse set B and a recursive tally set A. 

Next we consider the relationship between <? and Let M be an NPTM. 
By interchanging the accept and the reject states of M we get a new NPTM machine 
N such that Lrej(M) = L{N). If M is robustly strong, we have L(M^) — L(N^) 
for every oracle 4. The pair (M, N) is what Hartmanis and Hemachandra [7] call a 
robustly complementary pair of machines. For such a pair, the following is known. 

Theorem 9. [7] If (M,N) is a robustly complementary pair of machines, then 
(V4)[L(M^) € 

Gavalda and Balcazar [6] noted that, in view of the preceding discussion, one gets 
as an immediate corollary the following. 

Corollary 1. [6] (V4, B)[A B ^ 4 € 

In fact, the proof of Theorem 9 still works if M is an underproductive machine 
reducing Ato B. Thus, we have the following. 

Theorem 10. (V4,B)[4 <¥ B ^ 4 € 

Not only is the proof of Theorem 9 not valid for <^ , but indeed the statement of 
Theorem 10 with <Jp replaced by is outright false. This follows as a corollary 
to a proof of Crescenzi and Silvestri [5, Theorem 3.1] in which they give a very nice 
application of Sperner’s Lemma. 

Theorem 11. (34, E)[A <g E A A^ pSATeBj^ 

As mentioned earlier. Theorem 1 1 follows from the proof of [5, Theorem 3.1], but 
not from the theorem itself. 

The preceding two theorems have the consequence of showing a deep asymmetry 
between <? and This asymmetry — that <!j 7^ , yet to prove the analog 

for would resolve the P 7^ NP question — contrasts with the seemingly symmet- 
ric definitions of these two notions. We now turn to some results that will lead to the 
establishment of this asymmetry. 

Theorem 12. Overproductive and underproductive reductions differ in such a way 
that <!j ^ <^. 

An immediate consequence of Theorem 12 is the following. 

Theorem 13. <5^ 7^ <§• 

We conjecture that Theorem 13 can be stated in the much stronger form of 
Theorem 6, where <|J^ is replaced with <^. From Theorem 13, it follows that 
^ to that, although we know that <!^ and differ, 

it may be extremely hard to prove them to differ with a sparse set on the right-hand side. 
More precisely, we have the following. 
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Theorem 14. (3B e SPARSE) [R^j?(B) ^ RP (R)] ^ P ^ NP. 

The fact ^ stated above, sharply contrasts with the following. 

Theorem 15. <u ^ <p ^ P ^ NP. 

Theorem 15 strengthens in two ways the statement, noted by Gavalda and 
Balcazar [ 6 ], that if differs from anywhere on the recursive sets then 

P / NP. In particular, we have these two improvements of that statement of Gavalda 
and Balcazar: (a) we improve from to , and (b) we remove the “on the recur- 
sive sets” scope restriction. 

Below, we use X <^Y to denote that it is not the case that X Y. 

Corollary 2. Y ^ Y ^ NP. 2. ^ ^ p ^ NP. 

So proving ^ ^ ^ ^i^ounts to proving 

P ^ NP. In particular, we cannot hope to strengthen Theorem 6 so that it is valid 
for rather than . Although we know that <!j ^ <^, it is also difficult to 
show that they differ with respect to a sparse set on the right hand side, because we 
have (3B £ SPARSE) [Rg(B) g RJ^(R)] ^ P 7 ^ NP, which is a consequence of 
Theorem 14. 

Theorem 16. 1. ^ P 7 ^ NP. 2. = < 5 . ^ P = NPDcoNP. 

5 Overproductive Reductions and the Classic Hardness Theorems 

As is standard, the polynomial hierarchy is defined as follows: (a) 27 q = P; (b) for each 
% > 0, = NP^^"^ ; (c) for each i > 0, ilf = {L | L e }; and (d) PH = . 

= {L I L SAT}, where denotes polynomial-time truth- table reduction. 
ZPP denotes expected polynomial time. It is well-known that NP C 91 C pNP C 

ZppNP g 

It is very natural to ask whether the existence of sparse hard or complete sets with 
respect to our new reductions would imply collapses of the polynomial hierarchy similar 
to those that are known to hold for That is, are our reductions useful in extending 
the key standard results? To study this question, we must first briefly review what is 
known regarding the consequences of the existence of sparse NP-hard sets. The classic 
result in this direction was obtained by Karp and Lipton, and more recent research has 
yielded three increasingly strong extensions of their result. 

Theorem 17. 1. [11] NP C R^ (SPARSE) ^ PH C Tf. 2. (implicit in 
[11], see [15] and the discussion in [8]; explicitly achieved in [1,10]) NP C 
R|®(SPARSE) ^ PH C rf. 3. [13] NP C RfS(SPARSE) ^ PH C ZPP^^. 
4. [12] If A has self -computable witnesses and A G (NP® (1 coNP®)/poly, then 
ZppNP^ C ZPP^®^. 
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To explain why part 4 of this theorem is stronger than part 3, we mention that Kohler 
and Watanabe [13] state part 3 in the form NP C (NP D coNP)/poly PH C 
ZppNP^ which is equivalent to the statement of part 3 in light of Theorem 4. 

It remains open whether parts 3 or 4 of Theorem 17 can be extended from robustly 
strong reductions to overproductive reductions. However, as Theorem 18 we extend 
part 2 of Theorem 17 to overproductive reductions. As a consequence, there is at the 
present time no single strongest theorem on this topic; Theorem 18 and the final two 
parts of Theorem 17 seem to be incomparable in strength. 

Theorem 18. NP C Rg(spARSE) ^ PH C i7f. 

It remains open whether Theorem 1 8 can in some way be extended to underproduc- 
tive reductions (our proof does not extend to that case). An analog for strong nondeter- 
ministic reductions is implicitly known, but has a far weaker conclusion. 

Theorem 19. (implicit in [13]) NP C RfN (SPARSE) ^ PH C ZPP^’? . 

In contrast with the above results regarding sparse hard sets for NP, in the case of 
sparse complete sets for NP we have just as strong a collapse for -reductions as we 
have for -reductions. 

Theorem 20. ([9]) NP C R|^ (SPARSE n NP) ^ PH = . 

As mentioned earlier, we leave as an open problem whether one can establish the 
collapse PH C Xf (or, better still, PH C ZPP^^) under the assumption NP C 
R|^ (SPARSE), or even under the stronger assumption that NP C RJ^ (SPARSE). 
We conjecture that no such extension is possible. 

6 Conclusions and Open Problems 

Define the runtime of a nondeterministic machine on a given input to be the length of its 
longest computation path. (Though in most settings this is just one of a few equivalent 
definitions, we state it explicitly here as for the about-to-be-defined notion of local- 
polynomial machines, it is not at all clear that this equivalence remains valid.) Recall 
that we required that NPTMs be such that for each NPTM, N, it holds that there exists 
a polynomial p such that, for each oracle D, the runtime of is bounded by p. Call 
such a machine “global-polynomial” as there is a polynomial that globally bounds its 
runtime. Does this differ from a requirement that for a machine N it holds that, for 
each oracle D, there is a polynomial p (which may depend on D) such that the runtime 
of is bounded by pi Call such a machine “local-polynomial” as, though for every 
oracle it runs in polynomial time, the polynomial may depend on the oracle. 

In general, these notions do differ, notwithstanding the common wisdom in com- 
plexity theory that one may “without loss of generality” assume enumerations of ma- 
chines come with attached clocks independent of the oracle. (The subtle issue here is 
that the notions in fact usually do not differ on enumerations of machines that will be 
used with only one oracle.) The fact that they in general differ is made clear by the 
following theorems. These theorems show that there is a language transformation that 
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can be computed by a local-polynomial machine, yet each global-polynomial machine 
will, for some target set, fail almost everywhere to compute the set’s image under the 
language transformation. We write A B if A and B are equal almost everywhere, 
i.e., if (.4 — B) U {B — A) is a finite set. 

Theorem 21. There is a junction /]v : 2^"^ ^ 2^"^ (respectively, 
such that (1) there is a nondeterministic (respectively, deterministic) local-polynomial 
Turing machine M such that for each oracle A it holds that L{M^) — /iv(^4) (re- 
spectively, L(M^) — foiA)), and (2) for each NPTM, i.e., each nondeterministic 
global-polynomial Turing machine M (respectively, DPTM, i.e., each deterministic 
global-polynomial Turing machine M ) it holds that there is a set A C such that 
L(M^) =" fniA) (respectively, L(M^) =" /i>(.4)j. 

Though this claim may at first seem counterintuitive, its proof is almost immediate 
if one is given /]v and fo, and so we simply give functions f^ and fo satisfying 
the theorem. In particular, we can use /iv (A) = {x | (3y)[(|y| < log |x|) A {y is the 
lexicographically first string in .4) A (30)[|0| = f\xz £ .4]]} and fo{A) = 
{x I (3y)[(|y| < log |x|) A (y is the lexicographically first string in .4) A (3z)[(z is one 
of the lexicographically smallest length strings in 27^") /\ xz £ A]]}. 

The difference between global-polynomial machines and local-polynomial ma- 
chines in general mappings, as just proven, may make one wonder whether the fact 
that robust strong reduction is defined in terms of global-polynomial (as opposed to 
local-polynomial) machines makes a difference and, if so, which definition is more nat- 
ural. Regarding the former issue, we leave it as an open question. (The above theorems 
do not resolve this issue, as they deal with language-to- language transformations de- 
fined specifically over all of 2^"^, but in contrast a robustly strong reduction must 
accept a specific language only for one oracle, and for all others merely has to be 
underproductive and overproductive, plus it must have the global-polynomial prop- 
erty.) That is, the open question is: Does there exist a pair of sets A and B such that 
-4 B (which by definition involves a global-polynomial machine) and no non- 

deterministic local-polynomial Turing machine N has the properties that L(N^) = A 
and (VD C is both underproductive and overproductive] ? Regarding the ques- 

tion of naturalness, this is a matter of taste. However, we point out that the global- 
polynomial definition is exactly that of Gavalda and Balcazar [6], and that part 2 of 
Theorem 4, Gavalda and Balcazar’s [6] natural characterization of robustly strong re- 
ductions to sparse sets in terms of the complexity class (NP D coNP)/poly, seems to 
depend crucially on the fact that one’s machines are global-polynomial. 

On the other hand Theorem 18, though its proof seems on its surface to be dependent 
on the fact that is defined via global-polynomial machines, in fact remains true even 
if is redefined via local-polynomial machines. 
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Abstract. By means of different effect! vities of the epigraphs and hy- 
pographs of real functions we introduce several effectivizations of the 
semi- continuous real functions. We call a real function / lower semi- 
computable of type one if its hypograph hypo(/) := {{x^y) : f{x) > 
y Sz X E dom(/)} is recursively enumerably open in dom(/) x IR; / 
is lower semi-computable of type two if its closed epigraph Epi(/) := 
{(x,y) : f{x) < y Sz X E dom(/)} is recursively enumerably closed in 
dom(/) X IR and / is lower semi-computable of type three if Epi(/) is 
recursively closed in dom(/) x IR. These semi-computabilities and com- 
putability of real functions are compared. We show that, type one and 
type two semi-computability are independent and that type three semi- 
computability plus effectively uniform continuity implies computability 
which is false for type one and type two instead of type three. We show 
also that the integral of a type three semi-computable real function on a 
computable interval is not necessarily computable. 



1 Introduction 

In recursive analysis, real numbers x are usually represented by fast convergent 
Cauchy sequences of rational numbers which converge to x. This representation 
is denoded by p and the sequence corresponding to x is called a p-name of x. 
Then, a real number x is computable (more precisely /^-computable), iff it has 
a computable /?-name. A (partial) real function / :C IR — IR is computable, iff 
there is an algorithm M which produces a /?-name of /(x) from any p-name of x, 
for X G dom(/) (cf. [4,5,8]). Such kind of algorithms can be described by Type-2 
Turing machines (TT-machines) which generalize classic Turing machines in such 
a way that their inputs and outputs can be infinite sequences as well as finite 
strings (see [8,9]). Thus, similar to the case of number-theoretical functions, a 
real function / is computable iff there is a TT-machine M to compute it by means 
of the representation p (so called (p, p)-computability). Because any finite initial 
segment of the output of a TT-machine depends only on a finite portion of the 
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input, any computable real function is continuous. In some sense, computability 
of real functions is a kind of effectivization of continuity of real functions. 

For the effectivization of semi- continuity, Ge and Nerode [2] introduced a no- 
tion of recursively semi-continuity. A function / : [a; 6] — IR is recursively lower 
semi- continuous if its closed epigraph Epi(/) := {[x^y) G IR^ : f{x) < y Sz x E 
[a; b]} is recursively closed in the sense of [14,10]. Another effectivization of semi- 
continuity is given by Weihrauch and Zheng [11], where a real function / :G IR — ^ 
IR is called lower semi- computable if there is a TT-machine M such that M out- 
puts a rational left cut of f[x) from the input of any p-iiQine of x E dom(/), 
i.e., / is (/?,/?<)- computable, where /?< is a representation of real numbers by 
means of the left Dedekind cut. Equivalently, / is lower semi-computable iff its 
hypograph hypo(/) := {{x^y) G IR^ : f{x) > y Sz x £ dom(/)} is recursively 
enumerably (r.e.) open (in dom(/) x IR), i.e. there is a computable sequence 
{Bn : n G IN) of rational open balls of IR^ such that Vn G lN(i^^ C hypo(/)) and 
UrieiN ^ hypo(/). Because a recursively closed set is always r.e. closed, a re- 
cursively lower semi-continuous function is also lower semi-computable. Besides, 
it is also very natural to introduce another effective version of semi-continuous 
real function by requesting recursive enumerability of the closed epigraph Epi(/). 
Then we have altogether three kinds of effectivizations of lower semi-continuity 
of real function /: 1. by r.e. openess of hypo(/), 2. by r.e. closedness of Epi(/) 
and 3. by recursive closedness of Epi(/). In this paper we will call them type 
type 2 and type 3 lower semi- computability {1-^ 2- and 3-Ls.comp, in short), 
respectively. Accordingly, we can introduce type type 2 and type 3 upper 
semi- computabilities {1-^ 2- and 3-u.s.cornp, in short.) 

The basic properties of above lower semi-computabilities and their relation- 
ships will be discussed in this paper. We will show that, type 1 and type 2 
lower semi-computabilities are independent. In many respects, type 1 semi- 
computability looks more “natural” than type 2. For example, it is shown in [11], 
that if the function / is both 1-1. s. comp, and 1-u.s.comp., then / is computable; 
a 1-1. s. comp, function / maps every computable sequence of real numbers to 
a computable sequence of p<-computable real numbers (i.e., / is sequentially 
p< -computable) and the integral f f(x)dx of a 1-l.s.comp. function / on a 
computable interval [a; b] is p< -computable. But all of these fail for 2-l.s.comp. 
function. It is well known that a computable real function / : [0; 1] — IR is 
effectively uniformly continuous (see e.g., [5]), i.e., there is a recursive function 
e : IN — IN such that \f{x) — f{y) \ < 2~^ holds whenever |x — y| < l/e(n) for any 
G [0; 1] and n G IN. This is not the case for the continuous semi-computable 
function. In fact, we show that if a 3-1. s. comp, function / is effectively uniformly 
continuous, then / must be computable. 

For the integral, Ge [3] asked whether a 3-1. s. comp, function / has always a 
computable integral f{x)dx on a computable interval [a; b]. By a finite injury 
priority construction, we construct a 3-1. s. comp, function / : [0; 1] — IR such 
that the integral f{x)dx is p<-computable but not p-computable. 

In the next section, we will recall at first some definitions and basic facts 
about computable real subsets of IR^ and semi-continuous real functions. The 
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precise definition of semi-computable real function and some of their basic prop- 
erties are given in Section 3 . Section 4 discusses effectively uniformly continuity 
of semi-computable functions. In the last Section 5 we discuss the integral of 
semi-computable functions. 



2 Preliminaries 

We define some notions which will be used in this paper at first. For any n 7^ 0 
and ai, . . . , a^, 61, . . . , 6^ G IR, the set := {(xi,...,x^) e < 

Xi < bi for i = 1 , . . . , n} is called an open n- cuboid. Its left and right boundaries 
are denoted by := 04 and := 6^, respectively. If all boundaries 

of are rational numbers, then is an rational open n- cuboid. Rational 
closed n- cuboids can be defined similarly and denoted by accordingly. The 
sets of all rational open and closed n-cuboids are denoted by Int[n) and /nt(n), 
respectively. A sequence (r^ : n G IN) of rational numbers is computable if there 
are recursive functions : IN — IN such that = (/(n) — g[n))/ [h{n) + 1) 

for all n G IN. A double sequence (r^m - n, m G IN) of rational numbers is 
computable if (r^ : nG IN), with := Vrim^ is computable, where (*,•) : 

IN ^ IN is a computable pairing function with computable inverse functions 

TTi, 7T2 : IN — IN. A sequence {x^ : n G IN) of real numbers is called computable 
if there is a double computable sequence (r^m : n, m G IN) of rational numbers 
such that \xn — rnrn,\ < 2 ~'^ holds for all n, m G IN. A sequence : m G IN) of 
rational open (closed) n-cuboids is called computable if the sequences {li[Im^) : 
m G IN) and (r^i^^) : m G IN), 1 < i < n, of their boundaries are computable 
sequences of rational numbers. 

Suppose that : m G IN) is an effective enumeration of all rational open 

n-cuboids of IR^. An open set A C IR^ is called recursively enurnerably open 
(r.e. open) if the set {m G IN : C A} is recursively enumerable. A closed set 

B C IR^ is called recursively enurnerably closed (r.e. closed) if the set {m G IN : 
n 7^ 0 } is recursively enumerable. If A C IR^ is r.e. closed (open) and its 
complement A^ is r.e. open (closed), then A is called recursively closed (open). 
By recursive invariance of r.e. sets (cf. [ 7 ]), these definitions are independent of 
the recursive enumeration of (Im^ : rn G IN). 

A sequence (An : n G IN) of r.e. closed sets is called computable if there 
is a computable double sequence (Jij : G IN) of rational open n-cuboids of 

IR^ such that (Jij : j G IN) enumerates all rational open n-cuboids of IR^ which 
intersect Ai for alH G IN. Computable sequences of r.e. open sets and computable 
sequences of recusive open (closed) sets can be definied accordingly. 

R.e. open and r.e. closed sets have following useful characterizations. 

Theorem 1 ((Brattka and Weihrauch [10])). ( 1 ) An open set A C IR^ 

is r.e. open iff there is a computable sequence (J^ : n G IN) 0/ closed rational 
n-cuboids of IR^ such that Utigin = A; 
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(2) A closed set A C IR^ is r.e. closed iff there is a computable sequence 
{xn : n G IN) of real points of IR^ such that {xn : n G IN} forms a dense subset 
of A. 

The next theorem gives a convenient way to construct an r.e. closed real set. 

Theorem 2. Let [An : n G IN) be a computable sequence of r.e. closed sets. 
Then the set A := cl(|J^^j^ A^) is r.e. closed^ where cl(5) means the closure of 
set B. 

Now we recall the definition of semi-continuous real functions by means of 
epigraphs and hypographs. For any real function / :C IR — IR , its closed epi- 
graph Epi(/), epigraph epi(/), closed hypograph Hypo(/) and hypograph hypo(/) 
are defined, respectively, by: 

Epi(/) := {{x,y) € IR^ : f{x) <y k x e dom(/)}; 
epi(/) := {{x,y) e IR^ : f{x) <y k x e dom(/)}; 

Hypo(/) := {{x,y) e IR^ : f{x) >ykxe dom(/)}; 
hypo(/) := {{x,y) e IR^ : f{x) > y k x e dom(/)}. 

A real function / : A — IR is called lower semi- continuous {l.s.c. in short) 
if its hypograph hypo(/) is open in A x IR, or equivalently, its closed epigraph 
Epi(/) is closed in A x IR. / is called upper semi- continuous [u.s.c. in short) if 
its closed hypograph Hypo(/) is closed in A x IR, or equivalently, its epigraph 
epi(/) is open in A x IR. The sets of all l.s.c. and u.s.c. functions defined on A 
are denoted by LSC(A) and USC(A), respectively. 

Fix A to be an alphabet which contains 0, 1 and all the other symbols we 
need later. A* and are the sets of all finite strings and all infinite sequences of 
elements from A, respectively. An infinite sequence p G A^ is called computable 
if there is a computable function / : IN — A' such that p = /(0)/(l)/(2) • • • . Let 
A* — ^ (Q be some standard notation of set (Q of rational numbers, i^g(w) 
is denoted usually by u. For the real number set IR, besides the representation 
p by fast convergent Cauchy sequences mentioned in Section 1 which means 
that X = p[p) p = . . . such that lim^^oo Un = x and MwNn > 

rn(\um — Un\< 2“^), we will use another representation p\ defined hj x = p\{p) 
iff p = • • * and {{un,Vn) : n G IN} = {{u,v) G dom(i/g)^ : u < 

X < t;}, i.e., an pi-name of x enumerates all paires of (n,i;) G dom(i/g)^ such 
that u < X < V. Because p and pi are recursively equivalent (see e.g., [11]), we 
often do not distinguish them explicitly. 

Other useful representations of real numbers are p< and p> , which are defined 
by p<{p) = X {py{p) = x) iff p enumerates all u G dom(i/g) such that u < x 
(w > x). That is, an p< (p>)-name enumerates the left (right) Dedekind cut 
of X. Let 7 and 7^ be two representations of IR. A real number x is called 7- 
cornputable if there is a computable sequence p G A^ such that x — j{p)] and 
a function / :C IR — IR is computable iff there is a TT-machine M such 

that f{'j{p)) = 7 ^(/m (t)) for any p G dom(/ 07), where fj \4 A A^ is the 

function computed by M. 
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The next lemma is useful to construct counterexamples. 

Lemma 3 ((Weihrauch V Zheng [13])). There are a p^- computable real 
number ai and a p^-computable real number U 2 such that the sum a := ai U 2 
is neither p^- computable nor p^- computable. 

For other unexplained notations please refer to [5,8,10,11] • 



3 Semi-Computabilities of Real Functions 

In this section we define several semi-computabilities of real functions by means 
of different effectivities of their epigraphs and hypographs. Their basic properties 
and mutual relations are dicsussed. 

Definition 4. Let C IR and / : ^ IR. 

(1) / is called lower semi- computable of type one [l-Ls.comp,) if its hypograph 
hypo(/) is r.e. open in X x IR and / is called upper semi- computable of type one 
{l-u.s.compf if its epigraph epi(/) is r.e. open in X x IR. 

(2) / is called lower semi- computable of type two [2-1. s. comp,) if its closed 
epigraph Epi(/) is r.e. closed in X x IR and / is called upper semi- computable of 
type two [2-u.s.comp.) if its closed hypograph Hypo(/) is r.e. closed in X x IR. 

(3) / is called lower semi- computable of type three [3-1. s. comp.) if it is both 
1-l.s.comp. and 2-l.s.comp. or equivalently, if Epi(/) is recursively closed and / is 
called upper semi- computable of type three [3-u.s.comp.) if Hypo(/) is recursively 
closed. 

Remark: In the following, we will often say simply “open” (“closed”) in- 
stead of “open in dom(/) x IR” (“closed in dom(/) x IR”). 

Semi-computability of the third type for an upper-bounded real function 
defined on a computable interval [a; h] was introduced by Ge and Nerode [2] 
and they use the name recursive semi- continuity. And first type lower semi- 
computability was introduced by Weihrauch and Zheng [11], where they call 
them lower semi- computable. 

It is obvious that for any / : X — IR, X C IR, if / is 1-, 2- or 3-1. s. comp., 
then / is lower semi-continuous, and / is 1-, 2- or 3-1. s. comp, iff — / is 1-, 2- or 
3-u.s.comp., respectively. 

Theorem 5 ((Weihrauch Zheng [11])). Let X C IR and / : X ^ IR. 

(1) f is 1-l.s.comp. (l-u.s.cornp) iff f is (p, p<)- ( (p,p>)-^ computable; 

(2) If X is a computable intervaf then f is 1-l.s.comp (l-u.s.cornp) iff there is 

a computable increasing (decreasing) sequence (pp^ : n G IN) of rational polygon 
functions on X such that f[x)= lim^^oo P9n{^) ^ ^ X . 

Theorem 6 ((Weihrauch Zheng [11])). Let X C IR and f : X ^ \R be 

1-l.s.comp.. 

(1) f is computable iff f is also l-u.s.cornp.; 
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(2) f is sequentially p^- computable^ i.e.^ ^ ^ IN) 'Is a computable 

sequence of p^- computable real numbers whenever [xn : n G IN) is a computable 
sequence of real numbers; 

(3) For any computable real number a G IR; the set hypo(/, a) := {x E X : 
f[x) > a} is r.e. open in X; 

(4) If liminf^^+a/(^) > f{of (liminf^^_a /(x) > f{a)), then a is an 
p> (p^)- computable real number for any a E X , 

(5) If X is a computable intervaf then min{/(x) : x E X} is p^- computable. 

The assertions (1) - (3) of Theorem 6 do not hold accordingly for 2-Ls.comp. 
functions. By Theorem 2, a constant function / which takes a noncomputable but 
p> -computable real value is a 2-1. s. comp, function, hence 2-1. s. comp, functions 
are not necessarily sequentially -computable. Furthermore, let (a^ : n G IN) 
be a computable sequence of rational numbers such that a := lim^^oo cin is 
neither p< -computable nor p> -computable (see Lemma 3), and define a function 
/ : [0;1] ^ IR by /(1/2) = a; /(n/(2n+ 1)) := f{{n+ l)/(2n + 1)) := a„ for 
any n G IN and / is linear on all intervals [n/(2n + 1); (n + l)/(2n + 3)] and 
[(n + 2)/(2n + 3); (n+ l)/(2n+ 1)] for n G IN. Then / is both 2-1. s. comp, and 2- 
u.s.comp. but neither 1-1. s. comp, nor 1-u.s.comp., hence not computable, because 
/(1/2) = a is neither /?< -computable nor /?> -computable. From this example, 
we can obtain immediately the following corollary. 

Corollary 7. (1) There is a 2-Ls.comp. and 2-u.s.comp. real function which is 
neither sequentially p^-cornputable nor sequentially p^- computable, 

(2) There is a real function which is both 2-1, s, comp, and 2-u,s,cornp, but not 
computable. 

For (3) of Theorem 6 we have the following negative result. 

Theorem 8. There is a continuous 2-1, s, comp, function f : [0; 1] — IR and a 
computable real number a G IR such that Epi(/, a) := {x E [0; 1] : f[x) < a} is 
not r,e, closed. 

Part (4) and (5) of Theorem 6 hold accordingly for 2-1. s. comp, functions by 
exchanging p^ and p^. 

Theorem 9. Let A C IR and f : X ^ be a 2-1, s, comp, real function, 

(1) For any a E X, if liminf^^^+a /(^) > f{o) f{x) > f{a)), 

then a is p^- computable (p^-com.putable); 

(2) If f is lower bounded^ then inf{/(x) : x E X} is p^- computable. Espe- 
cially^ if X is a computable interval^ then min{/(x) : x E X} is p^- computable. 

Corollary 10. (1) If f \ X ^IR is 3-1, s, comp, and X is a computable interval^ 
then min{/(x) : x E X} is computable; 

(2) There are real functions /, ^ : A — IR such that f is l-l,s,cornp, but not 
2- is, comp, and g is 2- is, comp ^ but not l-l,s,comp,; 

(3) Let / : A — IR be 3-1, s, comp, and a E X , If liminf^c^+a /(^) > f{ci) 
or liminfcc^-a /(^) > f{^); then a is computable. 



190 



Vasco Brattka et al. 



Next theorem is about the integral of 1-Ls.comp. function. 

Theorem 11. Let f : [a; fe] — ^ IR he 1-1. s. comp, real function^ where a,6 G IR 
are computable. Then the integral f[x)dx is p^- computable. 

Theorem 11 is not true for the 2-l.s.comp. function. In Section 5 we will see 
that there is a 2-l.s.comp. function / : [0; 2] — IR such that f[x)dx is neither 
p<- nor p> -computable. 



4 Effectively Uniform Continuity 



This section discusses uniform continuity of semi-computable real functions. A 
function / : A — IR is called uniformly continuous if 

Ve e n+35 e IR+Vx,j/ e X{\x -y\<s^ \f{x) - f{y)\ < e). (1) 

/ is effectively uniformly continuous if there is a recursive function e : IN — IN 
such that 

Vn G lNVx,y G X (^\x — y\ < 2“®^^^ |/(^) ~ f{y)\ < . (2) 



Function e is called a modulus of uniform continuity of f . We will see that 
although a continuous 3-l.s.comp. function defined on a closed interval [a; b] is 
uniformly continuous, it is not necessarily effectively uniformly continuous. But, 
if a 3-l.s.comp. function / : [a; 6] — ^ IR is effectively uniformly continuous, then 
it must be computable. 



Theorem 12. There is a continuous 3-l.s.comp. function f : [0; 1] — IR such 
that f is not effectively uniformly continuous. 



Proof, (schedule) Let (a^ : n G IN) be a computable increasing sequence of ratio- 
nal numbers such that lim^^oo cin := is a p< -computable but not a computable 
real number. Define a function / : [0; 1] — IR by 

\ ._ j ^ if X = 1/2; 

if X = n/(2n + 1) or X = (n + l)/(2n + 1)) for n G IN, 



and / is linear on intervals [n/(2n + 1); (n + l)/(2n + 3)] and [(n + 2)/(2n T 
3); (n + l)/(2n +1)] for any n G IN. We can show that / is 3-l.s.comp. but / 
is not computable because /(1/2) = a is not computable and any computable 
function maps a computable real number to a computable one (cf. [5]). 

Now assume by contradiction that / is effectively uniformly continuous, i.e., 
there is a recursive function e : IN — which statisfies condition (2). Define 



a recursive function 



IN 



IN by 



prn 



2m-\-l 



< for 



any n G IN. Let := 2q^+i ' Then we have, for any n G IN, |a — = 

|/(l/2) — f{rn)\ < 2“^ because (2) and Vn G IN(|l/2 — r^| < l/e(n)), i.e., 
• n G IM) is a computable Cauchy sequence of rational numbers which 
converges effectively to a. This contradicts to noncomputability of a. So / is not 
effectively uniformly continuous. 
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Next theorem shows that effectively uniform continuity guarantees com- 
putability of a 3-Ls.comp. real function. 

Theorem 13. Let f : [0]1] ^ \R be a 3-l.s.comp. function. If f is effectively 
uniformly continuous^ then f is computable. 

5 Non-computability of the Integrals 

It is well known that any computable real function has a computable inte- 
gral on a computable interval. In Section 3 we have shown, that f[x)dx is 
p<-computable if / is a 1-1. s. comp, function on computable interval [a; 6 ]. Ge 
[3] asked that, whether the integral / = Jq f{x)dx is always computable, if 
/ : [0; 1] — ^ IR is recursively lower semi-continuous in their sense [2], i.e., 3- 
l.s. computable. The following results give a negative answer to the question. 

Theorem 14. There is an 3-1. s. comp, function f : [0; 1] — IR such that the 
integral f[x)dx is p^- computable but not computable. 

Proof. We will define a l.s.c. function / : [0; 1] — IR such that hypo(/) is r.e. 
open, Epi(/) is r.e. closed and f[x)dx is /?< -computable but not computable. 

Let a G (0; 1) be an /?< -computable but not computable real number and 
{an : n G IN) an increasing sequence of rational numbers which converges to 
a. Suppose that {{un^Vn) : n G IN) is a computable sequence of rational points 
which enumerates all rational points of [0; 1] x IR. Assume w.l.o.g. that uq = 0 , 
= ( 0 , 0 ). 

We define a function / : [0; 1] ^ IR by 



Note that there are only finitely many rational numbers x < an with f{x) = 0 
for any n G IN. This “finite injury priority” -like trick makes sure that Epi(/) is 
r.e. closed and hypo(/) is r.e. open. 

We prove now that the function / defined above satisfies the properties of 
the theorem. At first we show that / is l.s.c., i.e., liminfy^^ f{y) > f{x) for any 
X G [O 5 1]. 

If X G [a; 1], we have obviously that f{y) = 0 = f{x). Eor x G 

[0;a), there is an n G IN such that x < an- Because there are at most finitely 
many y G [0; a^) such that f{y) = 0 and for almost all other y G [0; a^), we have 
always f{y) = 1, hence liminfy^cc /(z/) = 1 > /(^)- So / is a l.s.c. function. 

Now we show that / is 1-1. s. comp., i.e., hypo(/) is r.e. open. Define, for any 
6 G [0; 1], a step function : [0; 1] ^ IR by /^(x) := 1 if x G [0; 6) and /^(x) := 0 
is X G [ 6 ; 1]. Obviously, is l.s.c. for any b G [0; 1]. Especially, fn : [0; 1] — IR 
defined by fn{^) := /a^(^)} is 3-1. s. comp, and (hypo(/^) : n G IN) 

is a computable sequence of r.e. open sets such that hypo(/) = Utigin 




X > a V 3n{an < x ^ 3i < n{x = Ui)) 
otherwise. 



192 



Vasco Brattka et al. 



since (a^ : n G IN) is a computable sequence of rational numbers. It follows 
immediately that hypo(/) is also r.e. open and hence / is 1 -l.s.comp. 

Next, we show that / is 2-l.s.comp., i.e., Epi(/) is r.e. closed. Since Epi(/) 
is closed by lower semi-continuity, it suffices to show that there is a computable 
sequence of rational points of [0; 1] x IR which forms a dense subset of Epi(/) 
by Theorem 1. We define such a computable sequence : n G IN) induc- 
tively: (xq^Z/o) •= (^ 07 ^o) = (0?0) for n+ 1, if satisfies one of 

the following conditions: 



0 < < 1 V > 1; (3) 

< Un+1 < 1 ^ ^ri+1 > 0; (4) 

dm V ^ Um V 1) & ^ 0, (^5) 

then define (x^+i,y^+i) := (n^+i, i;^+i). Otherwise, let (x^+i,y^+i) := (uo,uo). 



Obviously, {{xn^yn) : n G IN) is a computable sequence of rational points of 
[ 0 ;i] X ]R. We prove now by induction on n that the following hold for any 
n € IN: 



{xn.Vn) e Epi(/); and ( 6 ) 

^ Epi(/*) k (^) 

For n = 0: It is true because /(O) = 0. 

Eor n — n + 1: Suppose that ( 6 ) and (7) hold for all m < n. If (w^+i,i;^+i) 
satisfies one of the conditions (3)-(5), then (n^+i,i;^+i) G Epi(/) by the def- 
inition of /. Thus (x^+i,y^+i) G Epi(/) since (x^+i,y^+i) = (n^+i, i;^+i). 
Otherwise, (w^+i,i;^+i) ^ Epi(/) and (x^+i,y^+i) := (uo,uo). In both cases, 
( 6 ) and (7) hold for n+ 1. 

It follows from ( 6 ) and (7) that {{xn^yn) : n G IN) is a computable sequence 
of rational points which consists of exactly all rational points of Epi(/). To see 
that {{x^^y^) : n G IN} is dense in Epi(/), consider any point (x,y) G Epi(/) 
and any open 2 -cuboid (ci;c 2 ) x {di]d 2 ) which contains point (:r,y), i.e., c± < 
X < C 2 ^ di < y < ^ 2 . If X > a or y > 1, then there are rational numbers u^v 
such that X < u < C 2 and y < v < d. 2 - Thus (n,u) G Epi(/) n(ci; C 2 ) x [di^d^)- If 
X < a and y < 1, then f[x) = 0 because f[x) < y < 1 and rang(/) = {0, 1}. By 
the definition of /, there is m G IN such that x = Um- Choose a rational number 
V such that y < u < ^ 2 , then we have also {um^v) G Epi(/) Pi (ci; C 2 ) x (di; ^ 2 ). 
In both cases we have shown that there is a rational point which is contained in 
Epi(/) n(ci; C 2 ) X (di; d 2 ). Because the sequence {{xn^yn) '• n G IN) consists of all 
rational points of Epi(/), it follows that {{xn^y^) '• n G lN}n(ci; C 2 ) x (di; d 2 ) 7 ^ 
0. This means that {{xn^yn) '• n G IN} is a dense subset of Epi(/). By Theorem 
1 and Definition 4, / is a 2-l.s.comp., hence 3-1. s. comp, function. 

At last, we show that the integral f[x)dx = a. Since Vx G [0; l](/(x) < 

fa{x) holds for the step function /^, we have that Jq f{x)dx < Jq fa{x)dx = 
a at first. On the other hand, let : [0; 1] — IR be defined by /^(x) := 
min{/(x), (x)} for any x G [0; 1] and n G IN. Then Vx G [0; l](/(x) > fn{x)) 
holds for all n G IN and there are only finitely many x G [0; 1] which are among 
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the rational numbers uo,ui, . . . such that fn{^) = 0 7^ 1 = /an(^)* It fol- 
lows that Jq f[x)dx > Jq fn{x)dx = fa^{x)dx = a^,. Hence f[x)dx > a 

because lim^^oo = a. Therefore f{x)dx = a, i.e., the integral f{x)dx is 
p< -computable but not computable. This completes the proof of our theorem. 

Corollary 15. There is a 2-Ls.cornp. function / : [0; 2] — IR such that the 
integral f[x)dx is neither p^- computable nor p^- computable. 
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Abstract. We consider two generic problems of combinatorial search 
under the additive model. The first one is the problem of reconstruct- 
ing bounded-weight vectors. We establish an optimal upper bound and 
observe that it unifies many known results for coin-weighing problems. 
The developed technique provides a basis for the graph reconstruction 
problem. Optimal upper bound is proved for the class of fc-degenerate 
graphs. 



1 Introduction 

In many practical situations, one needs to obtain some information indirectly 
available through some physical device. Sometimes this implies costly or lengthy 
experiments so that the viability of the method crucially depends on the total 
number of them. Such problems are studied in the field of combinatorics called 
combinatorial search. We refer to monographs [2, 6] for a detailed account of 
modern methods and results in this area. 

Informally, a general combinatorial search problem is described by three pa- 
rameters: a universe of objects, a set of queries to the oracle and a set of possible 
answers. Objects are accessible only by the oracle. As every query to the ora- 
cle yields some information about the object, we repeat the process until we 
have enough information in order to uniquely identify the object. Our goal is to 
minimize the number of queries to the oracle. 

One can distinguish two major classes of combinatorial search problems, 
namely the adaptive and non-adaptive ones. The latter class contains all al- 
gorithms which make all queries in advance, before any answer is known. In 
contrast, an adaptive algorithm takes into account outcomes of previous queries 
in order to form a next one. The non-adaptive algorithms form a subclass of 
adaptive ones and they are generally weaker. Surprisingly, in many cases non- 
adaptive algorithms achieve the power of adaptive ones. This will be the case 
for our problems. 

In this paper we concentrate on two sets of objects. The first one is the set 
of d-bounded weight vectors i7(n,d), which consists of all n-dimensional, non- 
negative integer-valued vectors of the total weight (sum of components) at most 
d. The second class is the set of /^-degenerate graphs on n vertices ... ^Vn. 

Wen-Lian Hsu, Ming- Yang Kao (Eds.): COCOON’98, LNCS 1449, pp. 194-203, 1998. 

(c) Springer- Verlag Berlin Heidelberg 1998 
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The definition of -degenerate graphs is given below. Terms “d-bounded weight 
vector reconstruction problem” and “/^-degenerate graph reconstruction prob- 
lem” will refer to these two sets respectively. 

The set of allowed queries and the set of oracle’s answers are crucial for the 
complexity of the combinatorial search problem. For the set i7(n, d), an allowed 
query is a subset S' C {1, . . . , n} of vector positions. The answer to such a query 
S is the sum of entries corresponding to indices in S and will be denoted //^(S). 
That is, if the unknown vector is = (ai, . . . ,a^), then //^,(S) = 

Qn^ki an allowed query is a subset of vertices Q C {m, • • • For a graph 

G = (F, F') G Qn,ki the answer to the query Q C F is the number of edges with 
both endpoints in Q 7 we denote //g(Q) = \{Q x Q)nE\. Such a choice of queries 
and answers corresponds to the additive or quantitative model of combinatorial 
search. 

Historically, the additive model takes roots in a coin-weighing problem, posed 
by Sodenberg and Shapiro in 1963 (see [ 2 ]). In this problem there is a finite num- 
ber of coins, defective and authentic ones. The goal is to find the set of defective 
coins by possibly minimal number of weighings (or experiments). Each experi- 
ment consists in weighing an arbitrary subset of coins which reveals the number 
of defective ones. The problem was solved by B. Lindstrom [11], who gave an 
explicit optimal construction for the set of queries. A probabilistic proof 

can be found in [7]. This result was extended in several ways. In [10] Lind- 
strom obtained an explicit construction of a d-detecting matrix, which provides 
an optimal reconstruction algorithm for vectors with each entry bounded by d. 
This construction can be shown to be optimal for the class of non-adaptive 
algorithms (see [9]). Paper [9] studies the coin-weighing problem where the 
number of defective coins is bounded by a constant do. The upper bound of 
^log^ log n was established for the non-adaptive version of this problem. The 
naive information-theoretic lower bound for non-adaptive algorithms was im- 
proved in [3] to logn for all do < n and some constant c. Again, this 

class of objects is a proper subclass of dg-bounded weight vectors. 

To introduce main results of this paper, we point out a connection between 
coin-weighing and vector reconstruction problems. Namely, assuming all coun- 
terfeit coins are heavier, we can associate with every coin its “degree of falsity” , 
that is the difference between the coin weight and the weight of an authentic 
one. Our goal is to reconstruct the degree of falsity of every coin, i.e. the vector 
of coin overweights. A weighing of a subset of coins reveals the total overweight 
which is equal to the sum of corresponding entries of the coin overweights vector. 
This establishes correspondence between coin-weighing and vector reconstruc- 
tion problems. 

In the first part of this paper we extend previous results in the following 
direction: we show that an optimal algorithm exists for the problem when only 
the total overweight is known and the overweight of each individual coin is not 
bounded. Furthermore, the optimal upper bound can be achieved by a non- 
adaptive algorithm. This bound is of the same order as for the classical coin- 



196 Vladimir Grebinski 



weighing problem where degrees of falsity are restricted to (0,1) only. Thus, we 
gain a uniform viewpoint to all previously mentioned results. 

In the second part of the paper, we apply the results for bounded-weight 
vectors to reconstruction of graphs. Reconstruction of graphs covers a broad class 
of combinatorial search problems. Note that the problem of graph reconstruction 
is different from that of verifying a graph property [2] . 

In [8, 9] optimal algorithms were proposed for some classes of graphs. For 
example, it was shown that d- bounded degree graphs have reconstruction com- 
plexity 0[dn) which can be reached by a non- adaptive algorithm. Another ex- 
ample is provided by general graphs, where the universe of objects is the set of 
all labeled graphs on n vertices. This class has complexity matched by 

a non-ad aptive algorithm. The same problem was already considered in [1] in a 
slightly different setting. 

While these results already cover many classes of graphs, they all assume 
some local restriction (except for the extremal case of the class of all graphs). In 
particular, the maximum degree of a vertex turns out to be the main parameter 
in complexity bounds. We get rid of this restriction, but require a graph to be 
/^-degenerate (see Definition 2). We prove that for this graph reconstruction prob- 
lem, the lower and upper bounds asymptotically coincide up to a multiplicative 
factor. Furthermore, this can be achieved by a non- adaptive algorithm. 

Definitions and Conventions 

The following notation will be used throughout the paper. We assume implicitly 
that all graphs are labeled and simple, i.e. without loops or multiple edges. The 
weight of a vector is the sum of its entries, wt(i;) = Vi if v — {ui, . . . , 

The non-zero positions of a vector represent its support^ sp(i;) = {i\vi ^ 0}. 
All logarithms are natural unless the base is indicated. Finally, all considered 
matrices are (0, 1) -mat rices over the ring of integers. 

Throughout the paper we make several assumptions about the range of pa- 
rameters. In the first part of the paper, we consider only n-dimensional vectors, 
whose weight is bounded by a for an e > 0. This choice excludes the range 

of values where a trivial construction can be applied. In the second part of this 
paper we consider only /^ -degenerate graphs with /^ < n", with o; < 1, the choice 
is motivated by similar considerations. 

2 Non-adaptive Vector Reconstruction Problem 

In this section we give a lower and upper bounds for the complexity of recon- 
struction of bounded-weight vectors by a non-adaptive algorithm. Recall that a 
d-bounded weight vector is a vector v = (t?i, . . . ,u^), with non- negative integer 
components Vi G {0} U IN and ^ < d. The set of all such vectors will be 

denoted by i7(n,d) or i7. An algorithm tries to reconstruct a vector from Q by 
asking for a sum of entries with indices in a set S C {1, . . . , n} which it is free to 
choose. The complexity measure of the algorithm is the number of queries and 
will be denoted by /^(n, d). 
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2.1 Separating Matrices and Bounded Weight Vectors 

The notion of separating matrix plays a central role in the study of non-adaptive 
algorithms for coin- weighing problems. 

Definition 1. A matrix M G (0,1)^^^ with n columns and k rows is called 
separating for a set of vectors V iff the function v ^ M • v is injective on V. 

The importance of this notion is due to the following simple observation: 

Proposition 1. Constructing a non-adaptive algorithm for a coin-weighing 
problem with n coins under the additive model is equivalent to constructing a 
separating matrix with n columns. 

Indeed, let V be the set of all possible input vectors. Each query can be rep- 
resented as an incidence ( 0 , 1 )- vector of the objects that are put in the query. 
Consider the matrix M, whose rows correspond to queries and columns to ob- 
jects. A crucial observation is that the vector of answers for configuration v 
coincides with the vector M • v (in the additive model). Since the algorithm 
must distinguish between different vectors v\ 7 ^ V 2 we have M • v\ M • V 2 - 
Thus, M is a separating matrix for E. On the other hand, given a separating 
matrix M for a set of vectors V we obtain a non-adaptive algorithm, by treating 
rows of M as incidence vectors of queries. □ 



2.2 Lower Bound 



In this section we obtain a lower bound using the second moment method [4]. 
This lower bound is the factor of two away from the upper bound which will 
be obtained later. The idea of the proof is to consider the set of all vectors of 
the weight d as a uniform probabilistic space. Then, an estimation of a certain 
variance will show that the image M • it; of at least a half of vectors w ^ Q 
belong to a sphere of small radius if M G (0, 1)^^^ is a separating matrix for i7. 
We then obtain an estimation of the dimension of the matrix. 

Let i7 = i7(n, d) = . . . , d^)| = d} he Si probabilistic space with 

uniform distribution (here we consider only vectors of weight exactly d.) The 
]P[di = i] = 7 ^ simple calculation shows that E[di] = ^ 

and V ar[di] = Consider a random vector w = (di, . . . , G 42, 

and let V = M ' where v = ( m , • • • , The first step is to estimate V ar[vi]. 

Suppose there are exactly rn non-zero entries in Tth line of the matrix M, 
The symmetric structure of Q imposes that V ar[vj] = V ar[di^ + * * * + di^] = 
V ar[d\ T • • • + d^]- A direct calculation shows that 



Var\di -h • • • + dm\ 



d{n + d) 
n?fa + 1) 



• m • (n — m) < 



d{n + d) 
n?fa + 1) 



4 



( 1 ) 



Together with the linearity of expectation this gives: 






E[vi\f 



, V(n + rf) 

- 4(n+l) 






(2) 
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From Markov inequality it follows that: 



F 



- E[vi]f < 



> 



(3) 



Hence, at least vectors v belong to a /^-dimensional sphere of radius 

\J volume of /^-dimensional sphere is known to be 

a constant c\. Therefore, by volume argument. 



k/2 



for 



cid(n + d) \ ^ l/n + d— 1 

(n + 1) ) ” 2 ! n— 1 



(4) 



From this we obtain: 



^ ^ min(n - 1, d) log (1 + Z"n(n-t,d ) ) 

“ log(i+log(l+ „^) + logci 

Considering two cases of d < n — 1 and d > n — 1 and taking into account that 
d < we can further simplify the last expression and formulate the result in 

the following theorem: 



Theorem 1. 

d < 



There exists an absolute constant such that for all n 



k{n + 1, d) > 2 



min(n,rf)log(l+ 3yg ) 
(1 + 2e) log min(n, d) + c 



cx) and 

(6) 



2.3 Upper Bound for the Vector Reconstruction Problem 



In this section we apply the probabilistic method [7, 4] to obtain an upper bound 
on the dimension of a separating matrix M for the set i7(n,d) of d-bounded 
weight vectors. The general idea is to consider a set of “bad” events, defined by 
critical pairs^ and estimate the probability for a uniformly drawn matrix that 
any of them takes place. When this probability is strictly below 1 there is a 
matrix where no “bad” events occurs. Thus, we will estimate the dimension of 
the matrix M. 

For two different vectors v±^V 2 G V and a matrix M, we define a character- 
istic function i; 2 , M): 



x{vi,V2,M) 



1 if Mvi = Mv 2 , 
0 otherwise. 



For a matrix M which is not a separating matrix for i? we can find two witness 
vectors a = (ai, . . . , a^), b = (6i, . . . , bn) that enjoy two additional properties: 

1. sp(a) n sp(6) = 0. Otherwise, consider (a^,6^), where a' = (aj^,...,a^) , 
. . . , 6^), a- = a— min(a^,6^), h[ = hi — min(a^,6^). Obviously, 
wt(a^) < wt(a), wt(6^) < wt(6) and Ma^ = M& when Ma = Mb, 
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2. wt(a) = wt(6). This can be insured by adding to M an additional row with 
all entries equal to 1. We implicitly assume this row is always present in the 
matrix M, 

An ordered pair of vectors V\^V 2 C i7(n,d) satisfying the two properties above 
is said to be a critical pair. Let C = C(i7) be the set of all critical pairs. We have 



IP [M is not separating for C] = P 



y (x(vi,V2,M) = 1) 



We estimate this probability from above: 



(7) 



IP 



y (x(vi,V2,M) = 1) 



F[x{vi,V2,M) = 1] 



(8) 



From now on we assume the uniform distribution over kxn matrices M , except 
for the implicit row of all Fs mentioned above. The idea of obtaining an upper 
bound is to find the smallest k which makes the above sum smaller than 1. The 
first step is to obtain an upper bound for P [x{vi^V2^ M) = 1]. 

Lemma 1. Given a critical pair {vi^V2) (ind M uniformly distributed over 

Proof, Let , . . . , be a set of independent random variables with P [fi = 0] = 
P [fi = 1] = 1/2. The event Mvi = Mv2 is equivalent to k independent events 
corresponding to the equality in each row. Therefore, 

P [Mvi = Mv 2 ] = p [(s, Vl) = (s, V 2 )]'' , (10) 

where s = (■fi, • • • , ■?«), and {s,Vi) is the inner product of s and Vi. Since 
sp(t>i) nsp(v 2 ) = 0, then {s,v\) and {s,V 2 ) are independent and 



P [(s, Vl) = (s, V 2 )] = y^P[(s, Vl) = i] • P[(s,V 2 ) = i]< (11) 

i 






JE ^2) 

V ^ 



(12) 



The sum P [(^, j = 1,2, can be bounded from above by 

maxi P [(^, = i]. Indeed, consider an arbitrary integer- valued random variable 

^ and let PmarXC) = max.g^^P [^ = i]. Then Xi ^ < 

PmaxiO Xi IP K = *] = PmaxiO- Therefore, we can weaken (12) to 

P [{s, Vl) = {s, V 2 }] < \^Pmax{{s,Vl)) • ^/pmax{{s,V2}) (13) 

To estimate Pmax{{SjV)) we need the following technical proposition. 

Proposition 2. Let t be a natural number^ ai, . . . ,a^ > 0, and be 

independent random variables with P = 0] = P [fi = 1] = 1/2. Then 
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1- 2 ^ for allt> 1 , 

2. 'Prnaxi^l + * * * + ^t) = 2 

3. Prnax («lCl + • • • + Oxt^t) < J?mas(Cl + • • • + ^t), 



Proof, 1. For big t the inequality easily follows from Stirling formula. The con- 
stant was chosen to satisfy the inequality for allt> 1. 

2. This is obvious since P [^i T • • • T = i] = 2“^(^) < 

3. Let P"^^^ = Pmax{(ffifiP ' ' By definition, there is a value s such that 

IP [<^1^1 + • • • + (fftft = s] = Consider the family T = {A \ cti = 

s}. Clearly, card[J^) • 2~^ = p^^^. Since > 0, JF is a Sperner family of 
sets, that is there is no two sets A^B ^ such that A C B. By Sperner’s 
theorem [4], card{B) < 

We return to the proof of Lemma 1. To bound Pmax{{^:'^j))y 3 = 1^2, we apply 
Proposition 2 with t = sp{vj). We have 

By (10), (13), the Lemma follows. □ 

Let = {('yi,'U 2 )|wt(i;i) = wt(i; 2 ) = u) and sp(i;i) n sp(i; 2 ) = 0} and rewrite 
the right-hand side of (8) as 



d 

^M'Oi,V2,M) = 1]^J2J2f[x{vi,V2,M) = 1] (14) 

vj=i Cyj 



Using Lemma 1, we bound the inner sum for some fixed w. 



Ec^ IP[x(^1-^2,M) = 1] < 






f 9'|sp(-U2)| 



( 15 ) 



E '^1 ’ ( 9 . 1 sp ( XI 

wt(x)]^) = w ^ 



\ 



E 1 E 

wt(i; = , 

|sp(-i;i)|=s 






- (Er=i (s)(4-i) (^) ^) (16) 



The last inequality is obtained by dropping the condition sp(i;i) risp('U 2 ) = 0- 
Next we used the fact that there are (^) (^Ci) vectors vi of weight w with 
|sp('Ui)| = s, which follows from simple combinatorial considerations. 

Now we are left with the technical problem of finding a possibly minimal k 
which makes (16) smaller than |. This will make (14) smaller than 1 and achieve 
our goal. Finding such k requires some routine calculations that we omit. The 
following proposition gives the final result. 

Theorem 2. There exist absolute constants Ci, (72,^3 such that for all npl 
there exists a k x n separating matrix for the set of d-hounded weight vectors 
with k[n^d) hounded as 



k(n^ d) < 



4 min(n, d) log (Ci • max(n, d) / min(n, d)) 
log min(n, d) + C 2 



+ C 3 log d. 



(17) 



Comparing (17) with lower bound (6), we conclude that upper bound (17) is 
within the factor of 2(1 + 2e) from the lower bound provided that d < for 
our fixed parameter e > 0. 
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3 Non-adaptive Reconstruction of fc-Degenerate Graphs 



In this section we study the complexity of non-adaptive algorithms which re- 
construct the class of /^-degenerate graphs. This class of graphs is large enough 
to contain /^-bounded degree graphs, sums of k/2 trees and other interesting 
structures. 

Definition 2. A graph G = {V^ E) is called /^-degenerate if there exists an 
ordering of vertices V = such that for every i we have deg{vi) < 

k in the subgraph induced by the vertices 

The class of /^-degenerate graphs on n vertices will be denoted Qn,k^ For example, 
every tree is 1-degenerate, planar graphs are 5-degenerate (see [5]). Note that our 
definition is equivalent to the one in [5]. We mention that A;-degenerate graphs 
are k T 1 -colorable and have at most n • k — edges. For other properties of 

/^-degenerate graphs see [5] . 

Let //g(^) be the query function, i.e. the number of edges of the graph G 
with endpoints in X. The complexity c{Q) of graph reconstruction for a class of 
graphs Q is the number of queries sufficient to uniquely identify every graph in 

a- 

Theorem 3. For any constant a < 1 there are two constants and c^ such 
that for all k < 

be < (18) 

We start the proof by establishing the lower bound. Next we reformulate our 
problem in terms of bipartite graphs and finally apply the techniques developed 
for bounded weight vectors. 



Proof of the lower bound: To establish the information-theoretic lower bound 
we need to estimate from below the number 7V(n, k) of /^-degenerate graphs with 
n vertices. To obtain a /^-degenerate graph with m T 1 vertices one can take a 
/^-degenerate graph with rn vertices and choose any k vertices to be adjacent to 
the new vertex Since this can be done in (^) ways, we obtain the following 

estimation 



n 

N{n+l,k)> p 

i=fc+l 






(19) 



As it was mentioned above, the number of edges in a /^ -degenerate graph is 
at most kn — k{k + l)/2. From (19), our assumption k < and asymptotic 
n! (n/e)^ we obtain the following information-theoretic lower bound: 



log 



k (n + 1 - 



k+i ) N{n+ l,k) > log„ 



nfc(log n — log fc — 1) 
log n + log k 



^ 1 — O , / 7 \ 

> nk + oink) 

“1 + 0 ^ ^ 



Therefore, we can set 



□ 



Proof of the upper bound: In order to prove the upper bound, we reduce 
our problem to a problem of reconstructing a bipartite graph of special form. 



202 Vladimir Grebinski 



Specifically, we reduce the graph G = {V^E) and query function /x(Jt) to a 
bipartite graph and a new query function /xh Here G^ is the 

bipartite representation of G, i.e. and are copies of V, and there is an 
edge between G and G iff [E G E. The query function Y) 

for X (Z and Y C is defined to be ii\X^ Y) = \E^ C\[X x T)|, the number 
of edges between X and Y , 

Lemma 2 . One query jjfi-j') can he evaluated by five queries /x(-). 

Proof, In [9] it was shown that for arbitrary X C Vf Y C one query /x^ can 
be simulated by five queries /x: 

yfiX, Y) = E(X \Y)U{Y\ X)) - 2(/x(V \ T) + y{Y \ X)) + y{X) + y{Y). □ 

We are going to explicitly describe a family of queries /XQ/(Jt^, Yfi that recon- 
struct G^ uniquely provided that G^ corresponds to a /^-degenerate graph G as 
above. Let {Qj}fYi be a family of sets corresponding to rows of a matrix that 
is separating for the set of /^-bounded weight vectors. Theorem 2 states that 
rn = as n — cx>. Recall that for a given /^ -bounded weight vector 

V = (x;i, . . . values Sj = uniquely define v. Let {Pi}\^i be a fam- 

ily of sets corresponding to rows of a matrix which is separating for the set of 
(2n/^) -bounded weight vectors. Theorem 2 implies that I = 

Lemma 3. Values uniquely identify graph Gh 

Proof, The proof relies on the following essential properties of reconstruction of 
bounded-weight vectors: 

1. For fixed the value of jafi{vr}^Qj) can be uniquely reconstructed for all 

r = Indeed, E”=i Qj) < E”=i V") = = 

2 /xg(F) < 2nk. Consider a vector w = (xci, . . . ,xc^), where Wr = Qj)- 

By the choice of {P^}, vector w is uniquely defined by values of the sum XI 

for i = 1 . . . / , which are known, since by definition of pf Qj)' 

2 . Fix an order on vertices of 1 ^ 2 : • • • which is compatible with the 

definition of /^-degenerate graph. Thus pfi{vi}^ ^ 

3. Consider a vertex v\ G and vector e = (ci,...,e^), where = 
fP j {Pi}) 1 fhe incidence vector of v\ in Gk If one reconstructs e one will 
find all vertices adjacent to v\. By Step 3, v\ has at most k adjacent vertices in 

so the values Xa;gQ {] = 1 • • • uniquely define e by 

the property of {Qj}. According to Step 3, the values Sj = pfi{vi}^Qj) can be 
reconstructed for all j = 1 . . . m, which proves that vector e can be reconstructed 
and all vertices adjacent to v\ can be found. 

4. To proceed to vertex V 2 ^ we “exclude” vertex v\ from graph G and update 

jifiPi^Qj). This can be done without additional queries due to the additive 
nature of /xh Namely, given an edge (x;i,xi;), we subtract 2 from pfiPi^Qj) 
if both v\ and w belong to Pi and we subtract 1 if exactly one of vi 
or w belongs to Pi and the other to Qj? we do not change the value if 
{(wi, w) U n {Fi X Qj) = 0. 

5. We repeat the process for U 2 , U 3 , . . . Vn-±, 
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6. It is possible that there are several orders on vertices compatible with the 
definition of /^-degenerate graphs. The uniqueness of reconstruction follows from 
the fact that at the i-th step we reconstruct exactly those edges which are adja- 
cent to Vi in the graph. This implies that different graphs have different values 

□ 

The total number of queries /x^ is m • / = 0{nk). The reduction between /x^ 
and /X gives a factor of 5, according to Lemma 2. Thus, Theorem 3 follows. □ 

4 Open Problems 

A plausible conjecture is that the result of Theorem 3 holds for the graphs with 
a specified number of edges (i.e. \E\ = nk)^ but we are unable to prove it with 
our technique. 
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Abstract. We eonsider the problem raised by Bassalygo: "What is the 
maximum number of rearrangements required by a rearrangeable 3 -stage Clos 
network when there is an auxiliary middle switeh earrying a light load?" For a 
3-stage Clos network with an auxiliary middle switeh earrying s eonneetions, 
he elaimed that the maximum number of rearrangements cp^ (n,n,r;s) is less 
than s . In this paper, we give a lower bound 3x|_s/2j and an upper 

bound 2s -h 1 , where the lower bound shows that the upper bound given by 
Bassalygo does not hold in general. 



1 Introduction 

The 3-stage Clos network v(n,m,r) is the most basie multistage intereonneetion 

network and has been widely studied. The first stage of v(n,m,r) eonsists of r 

mxm erossbars, the seeond stage m rxr erossbars, and the third stage r mxm 
erossbars, and the eonneetions between adjaeent stages are eomplete bipartite graphs 
(see Fig. 1). 

A request is a pair of input and output requesting a connection. The requests come 
in sequentially; once connected, they turn into connections which can be released any 
time. A network is rearrangeable if under the condition that existing connections are 
allowed to be rearranged (rerouted), then a request can always be routed, meaning 
there exist link-disjoint paths one for each connection (including the request). It is 
well known [2] that v(n,n,r) is rearrangeable, while v(n,n-l,r) is not. 

Let cp(n,n,r) denote the maximum number of rearrangements required to 

guarantee the routing of any request for any network states. So far, all rearrangement 
algorithms use Pauli's matrix [3] (see Sec 2.) with all connections rearranged lying on 
one path in Pauli's matrix. We call this a one-path algorithm and let cp^ (n,n,r) denote 

the minimum number of rearrangements required under a one-path algorithm. Pauli 
proved 

cp, (n,n,r)<2(r-l), ( 1 ) 
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Fig. 1. v(n,,m,r) 

and Benes [2] improved to r - 1 . 

Bassalygo [1] considered the case that there exists a third middle switch c with a 
light load s . By taking advantage of the existence of c , he claimed 

(Pj (n,n,r;s) < s + ^/^ + l . (2) 

He gave an outline of proof but no detail. He also commented that 

(p(n,n,r;s) = s + l (3) 

is probable. In this paper we give a lower bound 

(Pi(n,n,r;s)>3x[s/2j . (4) 

Note that 



3x[s/2j > s + -\/^ + l for s>16 

, thus invalidating the upper bound of Bassalygo. We also give an example that 

(p(n,n,r;s)>s+2 . 



(5) 

( 6 ) 



2 One-path algorithm 

The Pauli's matrix P is an r xr matrix, where rows are indexed by input switches 1^ , 
columns by output switches Oj , and cell p-. by the set of middle switches each 

carrying a connection (li,Oj ) . By convention, the request to be routed is (li,Oj ) and 

we may assume that there exist two middle switches a and b , one appearing in the 
first row and the other the first column, but not both. Pauli's method is to switch the 
connection carried by a in the first row to be carried by b ; then the request can be 
routed through a . But switching that a to b means that if there is already a b in the 
same column as that a , then we must switch that b to a . Again, we need to check 
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whether there is already an a colinear with that b , and so on. Such a path, called an 
ab-path, alternates in vertical and horizontal turns, with an a at every vertical turn and 
b at every horizontal turn, and stops if and only if it hits a line not containing the 
other symbol. Similarly, we can start the ba-path from the b in the first column (Fig. 
2). Since the ab-path and ba-path are disjoint, and there are at most 2r -2 a 's and 
b ’s. One of the two paths contains at most r -1 symbols. 



b — 
b 



a 

a — [- 



b 

I 

a 



b a 



b 



Fig. 2. The ab-path and the ba-path 



Bassalygo's idea is to alter an ab-path or a ba-path at some point to an ac-path or a 
bc-path, and use the scarcity of c to show the existence of a shorter path. He gave the 
following rules for an one-path algorithm: 

1. A path must start with an ab-path or a ba-path. 

2. Once it turns into an ac-path or a bc-path, it cannot turn back to ab-path or ba-path. 

3. When the first c appears in a path, the line containing two letters preceding it does 
not contain letter c . 

4. Every matrix letter is visited at most once. 

5. A path stops if and only if either it hits a line not containing the other symbol, or an 
ab-path (or ba-path) hits a symbol whose row and column contain no c . 

Definition: We call a path a rerouting path if the path follows the rules given by 
Bassalygo. And a rerouting path can be divided into two parts, where the first part is 
an ab-path(or ba-path) and the second part is an ac-path(or bc-path). Note that a 
rerouting path can have null second part. 

For example, there are three rerouting paths in the Pauli's matrix of Fig. 3. They are 
ababab , abcb , and bababababababab . Note that the path acaca starting from cell 
(1,12) is not a rerouting path, because by definition a rerouting path must start with 

an ab-path or a ba-path. And the a ’s in cell (9,4) and (10,5) can’t turn to an ac-path 

because it will revisit an a already on the path (violating rule 4). After we choose a 
rerouting path, we can reroute the existing connections lying on the rerouting path, the 
scheme is to exchange a ’s and b ’s in the first part, which then starts an exchange of 
c ’s and the other letter of the second part. If a rerouting path has null second part. 
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which means the first part (an ab-path or ba-path) hits a symbol whose row and 
column contain no c , we change the last letter to a c . 
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Fig. 3. A Pauli’s matrix 



3 The lower bound and upper bound 

Before giving the construction for (n,n,r;s) > 3x[s/2j , we define the function 
more clearly. 

Definition: A network state is a legitimate arrangement of a Pauli’s matrix (each row 
and each column has at most n entries, all disjoint). 

Definition: (|)(x) = min ({lengths of rerouting paths of network state x}) 

Definition: (n,n,r;s) = max ((|)(x)), N(n,n,r;s) denotes the set of all states of 

V(n,n,rj where the mininiuni-fd^S^ middle switch carries s connections. 

Now we give the serial construction that (n,n,r;2k) > 3k . The general pattern 

of construction should be clear from the examples given in Fig. 4~8. 

Three points should be noted for understanding these examples: 
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1. We have permuted the rows and columns such that the first s rows and the first s 
columns all contain c ’s. The heavy lines denote this s xs matrix C . By rule 3, a 
rerouting path can turn to c only when it is out of C . 

2. Sometimes, the path is out of C but cannot turn to c because the c lies in the 
same line with a symbol already on the path. By rule 5, the path cannot stop at this 
c . But continuing the path would violate rule 4. 
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3. The shortest paths in these examples are the ab-paths and the ba-paths. 

Fig. 4. k = 2 and k = 3 





a 






c 












b 


a 


c 
















c 




b 








a 




c 














b a 


b 










a 




c 














b 


c 




a 












c 






b a 


















b 








a 

b 


a 




b 


a 

b 


a 

b 



Fig. 5. k = 4 





Fig. 7. k = 6 
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Fig. 8. k = 7 



Note that 3x[s/2j > s + ^/^ + l for all s>16, contradicting Bassalygo’s claim. 
Consider the size of the Pauli’s matrix, we have Theorem 1. 

Theorem 1: 



cPi (n,n,r;s) > min(3x[s/2j , r-l) 



( 7 ) 



Proof: By the result of Benes [2] and the above construction, the inequality holds, and 
the equality holds if J X|_s/zj ^ r - 1 . 

Theorem 2: 



cPi (n,n,r;s) < min(2s + l,r -l) (8) 

Proof: Assume that (li,Oj) is the next request, row i contains a but not b, and 
column j contains b but not a . Define u = (^tirst s rows j LM^tirst s columns ) . Then 
we can find two rerouting paths with null second parts and starting with the above a 
and b respectively. Since there are at most 4s a ’s and b ’s in D, one of the two 
paths must get out of D in 2s +1 steps. Then we can change the last letter to c and 
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interchange a ’s and b ’s in the path. Thus the request can be routed in 2s +1 
rearrangements. 



4 Two-path algorithm 

We may also route a request by a two-path algorithm. For example, we can route the 
request (l^O^) for the network state in Fig. 6 under a one-path algorithm, which 

requires 3x[l0/2j = 15 rearrangements if r>16. But we can rearrange the 
connections lying on the two paths as shown in Fig. 9, and then route the request 
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through c after only 4 rearrangements. 

Fig. 9. 

However, the number of rearrangements under a two-path algorithm is not always 
fewer than cp^ (n,n,r;s) . For example, we can route the request for the 

network state in Fig. 4 by rearranging the connections lying on the two paths (cbcb 
and caca ) as in Fig. 10, but 8 rearrangements are needed. Clearly, the request can be 
routed after 6 rearrangements under a one-path algorithm. Moreover, this is a 
counterexample for q)(n, n, r; s) = s + 1 . 
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Fig. 10. s = 4 
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Abstract. The 3-stage Clos network is generally considered the most 
basic multistage interconnecting network(MIN). The nonblocking prop- 
erty of such network has been extensively studied in the past. However 
there are only a few lower bound results regarding wide-sense nonblock- 
ing. We show that in the classical circuit switching environment, for large 
r to guarantee wide-sense nonblocking, 2n — 1 center switches are nec- 
essary where r is the number of input switches and n is the number of 
inlets of each input switch. For the multirate environment, we show that 
for large enough r any 3-stage Clos network needs at least 3n — 2 center 
switches to guarantee wide-sense nonblocking. Our proof works for even 
2-rate environment. 



1 Introduction 

The 3-stage Clos network is generally considered the most basic multistage in- 
terconnecting network(MIN). It is symmetric with respect to the center stage. 
The first stage, or the input stage^ has r n x rn crossbar switches; the center 
stage has rn r x r crossbar switches. The n inlets (outlets) on each input (out- 
put) switch are the inputs (outputs) of the network. There are exactly one link 
between every center switch and every input (output) switch. We use C(n,m,r) 
to denote a 3-stage Clos network. An example of C(3, 3, 4) is shown in figure 1. 

In classical circuit switching, i.e. every link can only serve one connection re- 
quest, three types of nonblocking properties have been extensively studied , they 
are strictly nonblocking, wide-sense nonblocking and rearrangeably nonblocking. 
The focus of this paper is to establish lower bounds for the number of center 
switches needed to guarantee wide-sense nonblocking. A network is wide-sense 
nonblocking (WSNB) if a new call is always routeable as long as all previous 
requests were routed according to a given routing algorithm. 

In multirate environments, a request is a triple {u,v,w) where u is an inlet, 
V an outlet and w a weight which can be thought as the bandwidth requirement 
(rate) of that request. We normalize the weights so that 1 > w > 0, and each 
link has capacity one; i.e., it can carry any number of calls as long as the sum 
of weights of these calls does not exceed one. 

Clos [3] proved that for the classical model 2n— 1 center switches are necessary 
and sufficient to guarantee SNB for C(n, m, r). Benes [1,2] proved that C(n, m, 2) 
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n 




Fig.l. C(3,3,4) 



is WSNB (using the packing routing) if and only if m > 3n/2, thus giving hope 
that WSNB can be achieved with fewer center switches than SNB in general. 
However recently, Du, Gao, Fishburn and Hwang [5] gave the surprising result 
that C(n,m,r) for r > 3 is WSNB under the packing routing if and only if 
m > 2n — 1; namely, it requires the same number of center switches as SNB. In 
this paper we further dash the hope by showing that for large r, C(n,m,r) is 
WSNB (under any routing algorithm) if and only if m > 2n — 1. While WSNB, 
as commented above, plays a very restrictive role in the classical model, the 
multirate environment provides a fertile playground. This is because we have 
a new dimension, the rate^ to design routing algorithm. For example, Gao and 
Hwang [6]gave a routing algorithm such that C(n, m,r) is WSNB if m > 5.75n. 
If there are only two different rates, then the requirement reduced to 4n. In 
this paper we shown that 3n — 2 is a lower bound of WSNB under any routing 
algorithm, and this bound is obtained by using only two different rates. This 
lower bound provides a gauge to measure how good the algorithm of Gao and 
Hwang is , and how much room to improve. We will also talk about some impact 
on repackable algorithms, a relatively new type of nonblocking. 



2 Main Results 



We first study the classical model, where every link can serve only one connection 
request. 



Theorem 1. For r > (n 



1 ) 



m > 2n — 1. 



2n-2 
n — 1 



T 2, C(n, rn, r) is WSNB if and only if 



Lower Bounds for Wide-Sense Non-blocking Clos Network 215 



Proof. Since SNB implies WSNB, it suffices to prove the “only if’ part. Suppose 

^ -h 2 and rn = 2n — 2. Let the network contains a set 

of connections involving n — 1 inputs from each input switches but does not 
involve the output switch O (easily verified to be a feasible state). Consider the 

^ distinct subsets of n — 1 center switches. The n — 1 connections from 

each input switch are routed by one such subset. For the given r there must exist 
a subset Y which routes a set X of n input switches. Consider a new set of n 
requests : x E X, oG O}. Each of the n requests must be routed through 

a distinct center switch, which is not in Y . Hence at least 

|X| + |T| = 2n-l 

center switches are needed. (See figure 2) □ 

Next we consider the multirate model where each request has a rate require- 
meet . We normalize the weights to be a number between zero and one, and each 
link has capacity one. 

Theorem 2. C(n, m, r) is not WSNB for any two rates B, b satisfying Bpb > 1 
if m < muik/n + 2n — 3 and r is large enough, where /^ = [1 /6J . 

Proof. Without loss of generality, we may assume that B = 1 and b = 1/k for 
some integer k > 2. 

In phase 1, all requests have weight b and come from the first inlet of each input 
switch (output will be specified later if necessary). 

Step i. Each input generates one request. Since r is large enough, there exists 
a large set of input switches whose requests are all routed through the same 
middle switch Mi. 

Step 2, Each input in ii generates a second request. Partition Ii into size k Y 1 
groups such that inputs in the same group all make the second request to the 
same output, this is possible since kpl < nk, the capacity of an output switch. 
But the total weights of each group, 6(A: + 1), require at least a new center switch 
M 2 to carry. Therefore in each group at least one input whose second request 
is carried by some center switch other than Mi. Since Ii is large there exists a 
large set I 2 of input switches whose requests are routed through the same set of 
middle switches M\,M 2 . 

Step i, Each input in fi-i generates a request. Partition into size 

[£ — l)k Y 1 groups such that inputs in the same group all make the re- 
quest to the same output, this is possible if {£ — l)k Y ^ < nk. But the total 
weights of each group, [£ — l)bk Y b, is greater than the capacity of the £ — 1 
center switches used before. Therefore in each group at least one input whose 
Bh carried by some center switch other than Mi, M 2 , . . . , Mi-i. Since 

B-i is large there exists a large set of input switches whose requests are routed 
through the same set of middle switches. (See figure 3) 

Step k, Each input in fi-i generates a k^^ request. Partition Ik-i into size 
[k — l)k Y I groups such that inputs in the same group all make the k^^ request 



/2n-2 
I n- 1 



1 ) 



2n-2 
n — 1 
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to the same output, this is possible \i {k — l)k ^ 1 < nk. But the total weights 
of each group, {k — 1)6/^ + 6, is greater than the capacity of the — 1 center 
switches used before. Therefore in each group at least one input whose k^^ re- 
quest is carried by some center switch other than Mi, M 2 , . . . , Mk-i- Since Ik-i 
is large there exists a large set Ik of input switches whose requests are routed 
through the same set of middle switches. Let M denotes the set of common 
center switches used by i/^. 

In phase 2, Consider only the input switches in Ik- Let each of them generate 
n — 1 weight B requests quests going to a set of clean output switches. Since 
B ^ h > 1 none of these requests can be routed through M. By theorem 1, 
another set of 2n — 2 center switches is needed. Hence the total of A: T 2n — 3 
middle switches is necessary. 

Corollary 3. For k > n, then rn > is impossible for WSNB when r is 

large. 

3 Conclusion 

A network is repaekahle if existing calls can be rearranged at any moment a 
connection is deleted (e.g., a call hands up). Repacking algorithms have been 
studied for both the classical model [7] and the multirate model [8,9] to show 
that they can help to reduce the number of center switches needed for C(n, m, r) 
to be nonblocking. Theorem 1 and 2 showed that no repacking algorithm can be 
effective when r is large, since the request sequences we construct in these theo- 
rems have no deletion, they apply to all repacking algorithms. Thus our results 
solidly confirm a major difference between WSNB, Repackable and SNB, RNB, 
namely the numbers of center switches required for C(n,m,r) are independent 
from r in the latter case. In other words, the hope of finding an effective WSNB 
or repackable algorithm is restricted to small r. 
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Fig. 3. Figure for step £ of theorem 2 
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Abstract. This paper investigates multirate multicast Clos switching 
network which is nonblocking in a wide sense, where a compatible mul- 
ticast request is guaranteed to be routed without disturbing the exist- 
ing network if all requests have conformed to a given routing scheme. 
The routing strategy discovers (2.875n — 1) min(/c T + 1 middle 
switches sufficient for any multirate multicast requests, whereas strictly 
nonblocking multirate switching networks requires infinite number of 
middle switches if the range of weights can be widely distributed. 

This paper also shows that Yang and Masson’s nonblocking multicast 
Clos network for pure circuit switching is rearrangeable for multirate 
multicast communication if each weight is chosen from a given finite set of 
integer multiplicity. Note that a general rearrangeability of multirate Clos 
network even for point-to-point communications has not been known 
yet. In our work, the number of middle switches only depends on the 
configuration of the switch itself but not on the patterns of connection 
requests, which is critically advisable to construct large scale switching 
networks. 



1 Introduction 

A Clos network has been widely employed for telephone switching systems in- 
stead of crossbar switches because of its asymptotic advantage over the crossbars 
while providing nonblocking behavior [3]. The Clos switching network was first 
developed for a pure circuit switching, in which a connection request should set 
up a physical path between its source and destination, and the path should be 
dedicated to the single service during the entire conversation so that all links on 
the path cannot be used for any other communications. 

Advances in digital technology have allowed several telephones and data ter- 
minals to share a single link if the total load on the link does not exceed its 
capacity [2,6,7]. In this multirate environment, each request takes a portion of a 
link bandwidth. A spectrum of network services have been introduced during the 
past decade, which need different connection characteristics. In those services, it 
is necessary to transmit their information to multiple destinations. To provide 
flexible communication environments and to obtain cost-effectiveness, there have 
been numerous efforts to integrate the dissimilar network services into a single 
network. The advent of asynchronous transfer mode(ATM) has further forced 
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to combine the heterogeous communication networks. The current implemen- 
tation of multicast switching devices, however, are based on blocking networks 
because nonblocking multicast switching networks were known to require unre- 
alistic hardware complexities. 

In this paper, we study routing schemes for multirate multicast Clos net- 
works to obtain the minimum hardware complexity. The numbers of middle stage 
switches in the Clos networks depend only on the parameters of the network it- 
self but not on the pattern of connection requests, which is highly desirable for 
constructing a real switching network. 

2 Multistage Switching Networks 

An interconnection network is represented as a directed weight graph G = (C, E ) . 
V/ C C is a set of external inlets, each of which has one outgoing edge and no 
incoming edge. Vo C C is a set of external outlets, each of which has one 
incoming edge and no outgoing edge. For an n- stage network, F/ and Vq are 
said to be in stage 0 and n + 1, respectively. Nodes in stage i have directed 
edges only to nodes in stage i + 1 for 0 < i < n, and there exists only one 
edge between any pair of nodes. We construct uniform 3-stage Clos networks, 
denoted as C(ni, ri, n 2 , r 2 , m), forcing that each node in stage 1 (input stage) 
has ni incoming edges and rn outgoing edges to every node in stage 2 (middle 
stage), in which each node has r\ incoming edges and V 2 outgoing edges to every 
node in stage 3 (output stage). Each node in stage 3 has rn incoming edges 
and U 2 outgoing edges. Each switch module is assumed to be a crossbar switch 
which has nonblocking multicast capability, even though it can be recursively 
constructed with smaller Clos subnetworks. A symmetric Clos network, denoted 
as C(n,r, m), is induced from the asymmetric network with n\ = U 2 and r\ = V 2 - 

A connection request for point-to-point connection is a triple (x, y,cc;), where 
^ ^ kf, y G Vo and CO is a normalized bandwidth requirement (weight) of the 
connection request. A set of connection requests are said to be compatible if the 
sum of all weights passing each external link does not exceed its capacity. A 
connection request is compatible to the existing network if adding the request 
does not cause capacity overflows for any external links. A multicast connection 
request is defined as a triple (cr, V, co), where E C Eo denotes a set of output 
ports. A point-to-point request can easily be represented with a multicast nota- 
tion by imposing |y| = 1. A route is a tree connecting an input port to a set 
of output ports through middle stage switches. A configuration is a set of all 
routes. A configuration is said to be satisfied if we can And all routes in such a 
way that the sum of weights on each e E E is not larger than its capacity. 

A switching network is strictly nonblocking^ or simply nonblocking, if a new 
connection request can be routed without disturbing the existing network no 
matter how the previous calls were routed. It is well-known that a symmetric 
Clos network, C(n, r, 2n — 1), is strictly nonblocking in circuit switching point-to- 
point communications [3] . Melen and Turner [6] found a sufficient condition of 
a nonblocking Clos network for multirate point-to-point communications. They 
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took an advantage of higher speed of internal links over externals to reduce 
hardware requirements. Chung and Ross [2] determined necessary and sufficient 
conditions for multirate interconnection networks to be nonblocking for both 
discrete and continuous weights, especially when the external speed is same as 
the internal one. 

A network is said to be nonblocking in a wide-sense if a new request can be 
satisfied without interfering the existing network configuration under a condition 
that all requests comply to a given routing algorithm. Benes [ 1 ] proved a Clos 
network C(n, 2, m) is wide-sense nonblocking in circuit switching if m = [3n/2j. 
Melen and Turner [ 6 ] devised a Clos network C(n,r, 8n — 2) to be wide-sense 
nonblocking in a multirate environment by assembling two nonblocking Clos 
networks in parallel, each of them has 4n — 1 middles stage switches. All connec- 
tions with weights more than 1/2 are routed through only one of subnetworks, 
and all requests with weights no more than 1/2 are routed through the other 
subnetwork. 

Hwang [5] had given conditions rn = 0(nr) of rearrangeable multi-connection 
3-stage Clos networks, which is a generalization of interconnection networks such 
that a set of input ports are able to connect to a set of output ports. Yang and 
Masson [9] showed that a Clos network in circuit switching is nonblocking for 
multicast requests when m > min(n — 1)(A: T = 0(n In r/ln In r), where 

I <k < min(n— 1 , r). Yang [ 8 ] extended her previous result to obtain multirate 
nonblocking multicast networks with rn > min([l/ 6 J(n — l){k A weak- 

ness of the result is that the network requires an unbounded number of middle 
stage switches when b goes to 0 . 

3 Preliminaries 

We can denote a set of destinations as a set of output switches instead of output 
ports because an output switch module can fan out to as many as outlets once it 
receives a compatible request. We use a vector notation as a set of destinations 
in such a way that the j-th element of a vector is 1 if the destinations contain 
output switch j, 0 otherwise. A multirate multicast request is denoted as Ci = 
{xi^yi^LUi)^ where Xi E {l,...,nr} is an input port, yi is a vector of size r 
denoting a set of output switches in a bit-vector format, and tUi is a required 
weight. For the connection request, we can define a connection vector as /^ = 
LOi-yi. Figure 1 shows four multicast connection requests C± = (2, (1,1, 1,1), .2), 
C 2 = ( 6 , (0,0, 1,0), .3), C 3 = ( 8 , (1, 1, 1, 1), .4) and C 4 = (11, (0, 1,0,0), .5). Their 
connection vectors are I\ = (.2, .2, .2, .2), I 2 = (0, 0, .3, 0), /a = (.4, .4, .4, .4) and 
l 4 = ( 0 , .5,0,0). 

Definition 1 (relational operators of vector). For given two vectors, x and 
y of length n, x < y if and only if Xi < yi for all i = 1,2, - • • ,n. x < y if x < y 
and X ^ y. Similarly, we can define > and > operators. 

For example, if x = (.3, . 2 , .5), y = (.4, .3, .9) and z = (.4, .5, .3) then x < y 
but X ^ z because xs ^ Z 3 . The k-th element of a connection vector I denotes 
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Fig. 1. C(3, 4, 5) Clos Network. Connection requests C\ = (2, (1, 1, 1, 1), .2), C 2 = 
(6, (0, 0,1,0), .3), ^ 3 = ( 8 , (1,1, 1,1), -4) andQ = (11,(0,1,0,0),.5). 



the weight on output switch k so that the sum of connection vectors, XlaH i 
represents the sum of weights loaded on each output switch. 

To describe the configuration of links between middle stage and output stage, 
we define destination vector Mj for each middle switch j such that M j{k) is the 
sum of weights loaded on link between middle switch j and output switch k. In 
Figure 1, for example, = (0, .4, .4, 0), M 2 = (.2, .2, 0, 0), AT 3 = (.4,0, .2, . 6 ), 
AT 4 = (0,0, .3,0) and M^ = (0, .5,0,0). Like connection vectors, the sum of 
destination vectors, j ^ represents the sum of weights loaded on each 
output switch. Because each output switch has n output ports, its load is at 
most n. We can easily obtain the relationship between connection vectors and 
destination vectors after routing all requests as follows: 

Mj < n 

all i all j 

Definition 2 (min and max of vectors). For two vectors, x = 
(xi,^2, . . . and y = (^1,^2, • • • .Vn), operators min and max are defined as 
follows: 



m\n{x,y) = (min(xi, yi), min(x 2 , z/ 2 ), • • . , min(x^, y^)) 
max{x,y) = (max(xi,yi),max(x 2 ,z/ 2 ), • • . , max(x^, y^)) 



Multirate Multicast Switching Networks 



223 



Generally, for k veetors, Xj = {xj^i,Xj^2, • • • :^j,n) for I < j < k, 

min (xj) = ( min (xji), min (X02),..., min (xjn)) 
^<j<k x<j<k i<j<k i<j<k 

max (xn) = ( max (xn 1), max (xn 2), • • • , max (xn n)) 
l<j<k l<j<k l<j<k^ l<j<k^ 



For the previous destination vectors, mini<j< 5 (iFfj) = (0,0, 0,0) and 

maxi<j<5(Mj) = (.4, .5, .4, .6). 

4 Wide-Sense Nonblocking Multirate Multicast Networks 

We consider a special network in which the weight of a new multicast request is 
no more than for some positive integer p. 

Theorem 1 . A Clos network C(n,r, m) is nonhloeking for multirate multieast 
when Cnew = where los < "If oaeh request uses at most k middle 

switehes and 

P 



Proof. LVs < implies l — iOs> Let m' be the number of middle switches 
blocking the new multicast request cvs from x’s input switch. We obtain that, 

mfl — LUs) < — uos)k and rn^ < — ^s){p ^ 

Consider x r destination matrix by discarding rat most rn^ rows whose 
corresponding middle switches are blocking the new request from the input 
switch. Suppose that 1 — mini<^</^ ^ dnewj which means that any k middle 

switches can not satisfy the new multicast. Let ci[j) be the number of elements 
in the j-th row whose values are greater than 1 — uos- 

m"ci— — < m"ci(l - LOs) < y^ci(j)(l - los) < {pn - ujs)r 
P + j=i 

where ci = mmi<j<rn,''{oi{j)} and ci ^ 0. Hence, we obtain that 

m" < i^^-^s){p+l)r_ .2) 

P Cl 

Without loss of generality, assume the h-th row has the minimum. We can route 
a part of the request to r — ci output switches by using the h-th row and delete 
those r — Cl columns from destination matrix for finding the next middle switch 
to route the remaining destinations. Generally, assume there are only c^_i output 
switches which are needed to be routed through by using ni'' x c^_i destination 
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matrix for i < k. Let Ci[j) be the number of elements in the j-th rows 

whose values are greater than 1 — uos and q be the minimum of q (j ) for all j . 
Then, 



m'’'’ 

m!'ci—^ < m"ci{l - u^s) < X^Ci(j)(l - ljs) < {pn - los)c^,-i 
P + j=i 

< {fJn - iOs){p + i) C.-1 

~ P Ci 

where Q 7 ^ 0 for i < k. Otherwise, it is a contradiction to our assumption that 
any k middle switches can not satisfy the new multicast request. When i = k^ 
each row vector has at least one element whose value is greater than 1 — uos- 
Therefore, 



< m"{l - LOs) < - ^s) < {fin - ms)ck-i 

^ i=i 

If / (/3n- ws)7+ 1 ) 

m < Ck-i 

P 

A geometric mean is not less than the minimum of a sequence so that the mini- 
mum m'' can be obtained from (2), (3) and (4) as 

^ {pn-^s){p+l) ^,X/k 
P 

To provide a general multirate multicast switching networks, we extend a 
routing algorithm called a quota scheme from Gao and Hwang [4]. Connec- 
tion requests are partitioned into large calls {Cl) and small calls {Cs) by their 
weights. In addition, the set of middle switches(M) are also assumed to be par- 
titioned to Ms and Ml whose sizes are ms and m/,, respectively. The algorithm 
forces a Cs to use only Ms but allows a Cl to use not only Ml but also 
Theorem 1, a multicast request with lvs < for some positive p is classified 
to a small call and the one with lol > be a large call. Of course, lol should 
not be greater than B because it is the upper bound of all connection requests. 
For simplicity, let us define /(r) = min(A: T which was known to have an 

approximate minimum of 0(ln r/ In In r) [9] . 

Theorem 2. The multirate 3-stage Clos networks C{n,r,m), in which weights 
are within the range of [6, B] and external links can operate at j3, is nonblocking 
in a wide-sense if 

+ 23/32 

™ > S 1 c:/3 ^ 

(^ + n-l)/(r) forB>2:i/‘i2 

V o 



( 4 ) 



where p = [1 /B \ . 
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Proof. Assume a large call Ci = {x^y^uoL) with is compatible to the 

existing configuration. Let Mf be a subset of Ms blocking the large call from 
x’s input switch and Mf be a subset of Ml blocking the request from the input 
switch by carrying exactly p calls and their sizes are and mf , respectively. 
Each request is supposed to mult ip lie ate its message at most k times at the input 
switch. Because of the compatibility, the maximum weights going to the middle 
stage out of the input switch is at most [f3n — to L)k. Therefore 

^ -B)< < {!3n - ujL)k (5) 

Let Mg = Ms \Mg and Mf^ = Ml \ Mf be the subset of and Ml which 
are available for the large call, respectively. Their sizes are denoted as nig and 
mf. To find out the maximum number of blocking links to output switches, let 
us consider [nig rnf) x r destination matrix M. Suppose that any k middle 
switches from mg-\-mf can not satisfy the request. We will use the same notation 
for Ci[j) and q as Theorem 1 but = minj^jvfyuM"{Q 5)}- 

- B) < {f3n - uj^y 

jeM'l ^ jeM'^ 

^L-f-r + ^s{^-B)<Xn-LJL)^ (6) 

p-\- i Cl 

We apply the similar method as before to contract the destination matrix and 
obtain the minimum number of middle switches as, 

mf — A Tag{^ — B) < [fin — ool)—— iov i <k (7) 

P T 1 Cl 

^ ^ ^ fori = A; (8) 



From (6), (7) and (8), we get 

rn'[—f-^^rns{l-B)<{l3n-ujL)C^^ (9) 

Because mf A nif = niL and rnf A enf = rns^ we obtain the following by 
summing up (5) and (9): 

^ [(/?n - t^^)/(r) - (1 - B)ms]{p+1) 

P 

From Theorem 1, the maximum numbers of blocking middles nif and m£ in 
Ms and Ml are obtained by 0 and cj/, — 0. 



=1= 

rag = 



m 



^ 

L — 



/3n{Bp+B-l){p+l) 

rP ^ A, 



226 Dongsoo S. Kim and Ding-Zhu Du 



Hence, the network is wide-sense nonblocking if one more middle switch is pro- 
vided in addition to 



When B = 1/2 so as p = 2, a Clos network C(n,r, mi) is nonblocking if 
mi > ^ /(r). For the other case of c<; > 1/2, a Clos network C(n,r, m 2 ) is 

nonblocking if m 2 > (n — l)/(r) because a link can not carry more than one 
request anyhow. We can combine these two Clos networks in parallel to construct 
a general wide-sense nonblocking Clos network C(n, r, m) for multirate multicast 
in which m > (IMzi -\- n — l)/(r). In this network, all multicast requests with 
LU < 1/2 are routed through the first sub-network, and all requests with c<; > 1 /2 
are routed through the other sub-network. 

Let us compare ^i(i^) = f3n{Bp-\-B-\-p—l){p-\-l)f{r)/p‘^ with p 2 = (15/?n/8-h 
n — l)/(r). It is easy to verify gi{B) is an increasing function on B and it is 
always smaller than p 2 for B < 1/2 because /? < 1. For B > 1/2, i.e. p = 1, 
gi{B) is approximately equal to g 2 set B = 23/32. 

5 Rearrangeable Multirate Multicast 

Yang and Masson [9] gave a nonblocking 3-stage multicast Clos network 
C{n^r^m) for pure circuit switching if the number of middle stage switches is 
larger than (n — l)min(A: + r^/^). In this section, we show the Clos network 
is rearrangeable for multirate multicast communications of some special dis- 
crete bandwidth cases. Each multicast request is assumed to have a normalized 
weight from a given finite set {pi^P 2 ^ * * * :Ph}: where P3|p2r * * :Ph\Ph-i: 

1 > Pi > 1/2 > P 2 > • • * > P/i, > 0- It is called integer multiplicity of discrete 
bandwidths for p 2 to ph- The rearrangement algorithm orders the requests by 
their weights and routes the heaviest request first, each of them is restricted to 
use at most k middle switches. To route the next heaviest request, the algorithm 
would not disturb the heavier requests which were already routed and route 
them by using at most k middle switches. It continues to route other requests 
until the lightest requests are successfully routed. 

For a new multicast request, let us consider how many middle stage switches 
are needed to satisfy the request. Because the maximum fan-out is limited to r 
for a symmetric Clos network C{n^r^ni) as section 3, we use at most r middle 
switches. We are also able to discover that n — 1 is another upper bound a 
compatible multicast requests. 

Theorem 3. A symmetric Clos network C(n,r, m) is multirate rearrangeable 
when each connection has a weight chosen from a given finite set {pipp 2 , * * * PPh}, 
where ps\p 2 , ■ ■ ■ ,Ph\ph-i, and 1 > pi > 1/2 > p 2 >■■■> Ph > 0 if 



rn > 



j3n{Bp^ B ^p-l){p^l) 

^2 V , 



( 10 ) 



m > (n — 1) 



mm 

1< A:<min(?i— l,r) 



(fc + rV*) 



( 11 ) 



Multirate Multicast Switching Networks 



227 



Proof. We will prove this theorem by induction on h. For = 1, each link can 
carry no more than one call due to > 1 /2 so the Clos network is nonblocking 
and rearrangeable. Assume that the Clos network is rearrangeable for h = —1. 

Consider two integers u and v such that pi -\- uph' < 1 < pi -\- {u -\- 1 )ph^ and 
< 1 < If a link blocks a new connection request of weight ph^^ the 

blocking link is carrying either one pi-call and weights of u -calls (C -blocking), 

or V -calls (F -blocking). Let us assume more than (n — l)k middle switches 

are blocking the new ph'-call from input stage. Because all connection requests 
were able to duplicate their messages at most k times at the input switch, at 
least n input ports should have carry full weights that are pi~\- uph' or vpy .This 
is a contradiction to our assumption for the compatible new ph^-call. 

Suppose that any k middle switches among can not satisfy the new ph'- 
call. Then, we can obtain < (n — l)r^/^. The total number of blocking links 
between middle stage and output stage is no more than (n — l)r because each 
output switch have at most (n — 1) output ports which are either L-blocking 
or F-blocking for the new call. By using the similar approach as the previous 
section, we can obtain. 



rn^ < 


(n — l)r/ci 


(12) 


rn^ < 


(n — l)c^_i/c^ for i < k 


(13) 


rn^ < 


(n — l)ck-i for i = k 


(14) 



A minimum of a sequence is not larger than its geometric mean so that, from 
(12), (13) and 14, we can obtain. 



< 



n-1)— -(n-1)— •••(n- 1)^^ • (n - l)ck-i 



->i/k 



Cl 

= (n — l)r^/^ 



C2 



Ck-1 



We showed that the nonblocking multicast Clos network for pure circuit 
switching is also rearrangeable for multirate multicast communications if each 
request has a weight chosen from integer multiplicity of discrete bandwidth for 
P 2 to ph and 1 > Pi > 1/2 > p 2 > • • • > Ph- In the following, we observe other 
special cases that provide more flexibility to the finite set of weights. 



Corollary 1. A symmetric Clos network C(n,r, m) is multirate rearrangeable 
when each connection has a weight chosen from a given finite set {pi,P 2 , * * * PPh}, 
where p^+i|p^,p^+2b^+i, * ‘ ‘ PPh\Ph-i, and 1 > pi > P 2 > * * * > Pi-i > 1/2 > 

Pi > pi^i > ••• > Ph>0 if 

m > (n — 1) min (kpr^^^) (15) 

1< A:<min(?i— l,r) 

Proof. We apply the same idea as Theorem 3, but consider i integers 
ui, U 2 , • • • , Ui-i and v such that pj T UjPh' A 1 < Pj + {uj + l)ph' for jW i — 1 
and vph' < 1 < (u + l)pw ^ 
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Corollary 2. A symmetric Clos network C(n,r, m) is multirate rearrangeable 
when each connection has a weight chosen from a given finite set {pi,P2, * * * PPh}, 
where P2\pi,Ps\P2, ■ ■ ■ ,Ph\Ph-i, and 1 > pi > P2 > ■ ■ ■ > Ph > 0 if 

m > (n — 1) min + (16) 

1< A:<min(?i— l,r) 

Proof. Consider an integer u such that upi < 1 < [u -\- l)pi for each iteration. 

6 Conclusions 

We have studied the construction of multirate 3 -stage Clos switching networks 
which were nonblocking in a wide sense for multicast communications. To over- 
come the obstacle of other nonblocking multirate Clos networks requiring in- 
finite number of switch elements at the worst case, the middle stage switches 
were partitioned into two or three subsets and the routing algorithm allowed 
connection requests to utilize one of two subset according to their normalized 
weights. The hardware complexities of the networks were determined only by the 
configurations of the networks themselves but not by the patterns of connection 
requests, which is extremely advisable to build a real large-scale switching net- 
work. The nonblocking circuit switching multicast Clos network was also shown 
to be rearrangeable for some discrete multirate multicast communications. The 
rearrangeable routing algorithm sorted connection requests by their normalized 
weights and routed the heaviest requests first and then the next heaviest and so 
forth in nonblocking fashion. 
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Abstract. The mesh of buses (MBUS) is a parallel computation model 
which consists of n x n processors, n row buses and n column buses 
but no local connections between two neighboring processors. As for 
deterministic (permutation) routing on MBUSs, the known 1.5n upper 
bound appears to be hard to improve. Also, the information theoretic 
lower bound for any type of MBUS routing is l.On. In this paper, we 
present two randomized algorithms for MBUS routing. One of them runs 
in 1.4375nTo(n) steps with high probability. The other runs 1.25nTo(n) 
steps also with high probability but needs more local computation. 

1 Introduction 

The two dimensional mesh is widely considered to be a promising parallel ar- 
chitecture in its scalability [Lei92,MS96]. In this architecture, processors are 
naturally placed at intersections of horizontal and vertical grids, while there can 
be two different types of communication links: The first type is shown in Fig- 
ure Each processor is connected to its four neighbors and such a system 

is called a mesh- connected computer (an MC for short). Figure l-(2) shows the 
second type: Each processor is connected to a couple of (row and column) buses. 
The system is then called a mesh of buses (an MBUS for short). 

Permutation routing (simply routing in this paper) is apparently a basic form 
of communication among the processors: The input is given by packets that 
are initially held by the n x n processors, one by each. Routing requires that 
all such packets be moved to their destinations that are mutually distinct. 
In the case of MCs, a 2n — 2 lower bound comes from a fundamental nature 
of the model, i.e., the physical distance between the farthest two processors. 
Also, the same 2n — 2 upper bound can be achieved by an elementary algorithm 
based on the dimension- order strategy [Lei92,Tom94]. Thus there remains little 
for further research in the case of MCs. (This is not true for limited buffer-size 
as mentioned later.) In the case of MBUSs, on the other hand, there is a wide 
margin between the known upper and lower bounds. First of all, unlike the case 
of MCs, the dimens ion- order strategy only gives us a poor algorithm which takes 
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trivial 2n steps. The best upper and lower bounds known are 1.5n and (1 — s:)n, 
respectively [1MK96]. The 1.5n bound appears to be hard to improve; it is also 
known to be a (tight) lower bound if we impose the so-called “source-oblivious” 
condition [lM97a]. 

The main purpose of this paper is to decrease this 1.5n upper bound by 
allowing randomization. Two randomized algorithms are given: One of them runs 
in 1.4375n + o(n) steps with high probability. The other runs in 1.25n + o(n) 
steps but needs more local computation. The idea is an efficient use of the buses 
and a reduction of packet collisions. Consider, for example, the (deterministic) 
dimension-order routing where each packet first moves horizontally (in the order 
of original position) using the first n steps and then moves vertically (in the 
order of destination position) in the second n steps. One can see that column 
buses are completely idle in the first n steps and so are row buses afterwards in 
this algorithm. 

A simple attempt to avoid this inefficiency is to try to move the first n packets 
vertically immediately after they move horizontally. If those n packets go to all 
different n columns, i.e., they have all different column destinations, then we can 
do this in a single step without collision. If collision happens in some column, 
then we can use randomization techniques to resolve the collision. If few collisions 
occur, then we might achieve an approximately l.On upper bound. Unfortunately, 
however, this observation is too optimistic. Some experiments show that a lot 
of collisions occur for even random permutation. It seems that this approach 
gives us no better bounds than 3n; it is much worse than the deterministic 
version. Thus, an efficient use of buses tends to imply more collisions. Our first 
algorithm avoids this difficulty in a tricky way. The second one is based on a novel 
use of the technique that allows us to generate many pseudo-random numbers 
deterministically from a few random numbers. 

Research on mesh routing has a long history and has a huge literature. Nev- 
ertheless there still remain a lot of unknowns. For example, our knowledge on 
the 3-dimensional (3-D) mesh is much weaker than the 2-D mesh. Recently, 
it is shown [IM97b] that minimum-bending oblivious routing on the 3-D mesh 
needs 0(iV^/^) steps that is much more than 0(iV^/^) for the 2-D mesh [N is 
the total number of processors). It is not known either whether we can improve 
O(n^) upper bounds substantially for 2-D obvious routing on MCs with con- 
stant buffer-size [CLT96,Kri91]. There is also a gap between the known upper 
and lower bounds, (1 + e)n T o(n) and 0.69 In, respectively, for 2-D routing on 
the mesh equipped with both buses and local connections [CL93,LS94]. 



2 Models and Problems 

An MBUS consists of n^ processors, 1 < < n, and n row and n column 

buses. Pij is connected to the ith row bus and the jth column bus. The problem 
of permutation routing on the MBUS is defined as follows: The input is given 
by packets that are initially held by the n^ processors, one by each. Each 
packet, (s,d, a), consists of three portions; s is a source address that shows 
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the initial position of the packet, d is a destination address that specifies the 
processor to which the packet should be moved, and a is a data portion that is 
not important in this paper. No two packets have the same destination address. 
Routing requires that all such packets be moved to their correct destinations. 

Our discussion throughout this paper is based on the following four rules on 
the model: (i) We follow the common practice on how to measure the running 
time of MBUSs: One-step computation of each processor F consists of (a) reading 
the current data on both row and column buses F is connected to, (b) executing 
arbitrarily complicated instructions using the local memory and (c) if necessary, 
writing data to the row and/or column buses. The written data will be read in 
the next step, (ii) The queue size is not bounded, namely, an arbitrary number 
of packets can stay on a single processor temporarily, (iii) What can be written 
on the buses by the processor F must be the packet originally given to F as its 
input packet or one of the packets that have been read so far by F from its row or 
column bus. (Nothing other than packets can be written.) This means that any 
kind of data compression is not allowed, (iv) We allow the simultaneous write. 
However, if two or more packets are written on the same bus simultaneously, then 
a special value flows on the bus, which has no information other than collision. 

As mentioned in the previous section, the 2n-step dimension-order routing 
moves horizontally the leftmost n packets initially placed on Tiq, ^ 2 , 1 ? * * * 7 ^n,i 
in step 1 , Ti, 2 , ^ 2 , 2 ,' * * 7 ^n ,2 in step 2 and so on. Namely packets are moved 
in their “source-order” in this first stage. In the second stage, n packets whose 
destinations are the uppermost n processors, Piq, Pi, 2 , * * * , Pi,nj nre moved ver- 
tically in step 1, then, P 2 , 17 ^ 2,27 * * * 7 ^ 2 , 71 ? nnd so on. Thus they are moved in 
the “destination-order” regardless of their current positions. It should be noted 
that this destination-order transmission can only be used after all the packets 
have moved horizontally. That is why column buses are completely idle in the 
first stage. If we do not wait, then we have to give up the destination-order 
transmission and encounter the more serious problem, i.e., packet collisions, as 
described in Section 1. 

The 1.5n-step algorithm, called DR4 from now on, reduces the number of 
first-stage steps from n to 0.5n as follows: The whole nx n plane is divided into 
four 0.5n x 0.5n subplanes. Packets in the upper- left 0.5n x 0.5n and the lower- 
right 0.5n X 0.5n subplanes are moved horizontally and those in the upper-right 
and the lower-left subplanes vertically, both in the source-order. Thus, all the 
buses are used in the first stage, which reduces the computation time one half. 
The second stage is almost the same as before. 

3 1.4375n + o(n) Randomized Algorithm 

Note that there are packets, 2n buses and each packet has to ride on a bus 
twice (in general). Thus 2 x n^/2n = n steps are needed even if we have no idle 
buses. In the 1.5n-step algorithm DR4^ Stage 1 has no idle buses. However, it is 
impossible to improve Stage 2 since we can create an instance as an “adversary” 
which leaves n packets on a single bus after Stage 1. (Detours might help but it 
seems difficult to design an algorithm that exploits the possibility.) 
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Our first randomized algorithm, UU^ is based on DR4- The basic idea is as 
follows: (i) We should avoid, for any instance, the bad case where n packets are 
gathered on a single bus after Stage 1. (ii) In other words, we should distribute 
packets evenly so that each single bus has approximately 0.5n packets at Stage 2. 
(iii) Then it is not so hard to design a randomized algorithm for Stage 2 that 
needs more than optimal 0,5n steps but some 0.75n steps are enough, (iv) In 
order to accomplish the even distribution in (ii), we now have to give up the 
very efficient Stage 1 of DR4- Some loss of performance is inevitable, but if we 
can keep it less than 0.75n steps then the 1.5n bound in total can be improved. 
Algorithm: RR 

Stage 1. The whole plane is divided into four subplanes as DR4- This stage 
consists of Stages 1-1 and 1-2. 

Stage 1-1. For a while, we only look at the upper- left subplane. The 0.5n 
processors in each row are divided into 0.125n blocks; each block includes four 
consecutive processors. For example, (Fi,i, Fi, 2 , Fi^ 4 ) is the first block of 

row 1, (Pi, 5 , Pi, 6 7 Pi, 7 7 Pi,s) is the second block and so on. Now from j = 1, 2, • • *, 
through 0.125n, namely for each block from left to right, the following Phases 1 
and 2 are executed. The same operation is executed on each row bus in parallel; 
the description below is for row i: 

Phase i : The first two processors of block j, i.e., and Pi, 4 j- 2 , write 

their initial packets on the row bus with probability 1/2. 

Phase 2 : One of the following four operations is selected due to the result 
of Phase 1. Note that all the processors on the bus can figure out which case 
occurred. 

1. If the packet whose source address is (i, — 3) was on the bus, i.e., if only 

Pz,4j-3 wrote the packet, then Pi^^j-i writes its initial packet on the row 
bus. Go to the next block. (See Figure 2-(l).) 

2. If only P^^ 4 j _2 wrote the packet, then Pi^Aj-\ writes its initial packet on the 
row bus. Go to the next block. (See Figure 2-(2).) 

3. If collision occurred, i.e., if both Pi^^js and Pi^ 4 j -2 wrote the packets, then 
Pi,4j-3 again writes its packet and then Pz, 4 j writes its packet on the row 
bus. Thus we need three steps in this (and next) case. Go to the next block. 
(See Figure 2-(3).) 

4. If the bus was idle, i.e., if neither Pi, 4 j -3 nor Pi, 4 j -2 wrote the packets, then 
Pi, 4 j -2 writes its packet and then Pz, 4 j writes its packet on the row bus. Go 
to the next block. (See Figure 2-(4).) 

Thus exactly two packets out of four ones in each block are moved horizontally. 
The remaining two packets, called m- packets^ are moved vertically in Stage 1-2. 

Stagel-2. Now the 0.5n processors in each column are divided into 0.25n 
blocks^ i.e., each block has two processors. From i = 1,2, ••*, through 0.25n, 
namely for each block from top to bottom. Phases 1 and 2 are executed: 

Phase 1 : Each of the two processors, P 2 i-ij and p 2 z,j, in block i writes its 
m-packet (if any). 

Phase 2 : Again one of the four operations is selected: 

1. If only P 2 i-ij wrote the packet, then go to the next block. (See Figure 3-(l).) 
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2. If only F 2 i,j wrote the packet, then go to the next block. (See Figure 3- (2).) 

3. If neither P 2 i-i,j nor wrote the packet, then go to the next block. (See 
Figure 3-(3).) 

4. If both P 2 i-i,j and /Fzj wrote the packets, then P 2 i-i,j writes its packet 
and then P 2 ij writes its packet on the column bus. Go to the next block. 
(See Figure 3-(4).) 

That concludes Stage 1 for the upper-left subplane. The algorithm is exactly 
the same for the lower-right subplane. As for the upper-right and the lower- 
left subplanes, we exchange rows and columns, i.e.. Stage 1-1 uses columns and 
Stage 1-2 rows. 

Stage 2. Every packet has already moved to its row or column destination. 
Thus the situation is the same as at the beginning of Stage 2 of DR4- The 
difference is that the number of packets held by processors on each single bus is 
evenly distributed, i.e., is about 0.5n (proof is given later). Let us look at some 
single bus, say, row 1. The basic idea of Stage 2 is to use the destination-order 
counterpart of Stage 1-2. Namely, at the first step, if a processor has a packet 
whose destination is (1,1) or (1,2) then it writes that packet on the row bus. 
If no collision occurs, i.e., if at most one of the two packets exists on this row, 
then we move forward. Otherwise, two more steps are used for sending each of 
those collided packets. 

Unfortunately, this algorithm does not work. The reason is that the two 
packets whose destinations are (1,1) and (1,2) may be held by some single 
processor. If this is the case, that processor puts either one on the bus but other 
processors following the above algorithm assume that there is only one packet 
on this row whose destination is in the first block. 

To solve this problem, we introduce a “special packet” [SP in short) whose 
purpose is to broadcast special information. (If one does not like to use a packet 
for such a special purpose, then we can give another algorithm whose perfor- 
mance is a little bit worse than the current one. See the end of this section.) 
As an we use the packet whose destination is (n,n). At the beginning of 
Stage 2, we introduce two extra steps to let all the processors know this 
Now from i = 1,2, • • *, through 0.5n, i.e., for each block from left to right, the 
following two phases are executed: 

Phase 1 : Let Qi Q 2 be packets whose destinations are in block i. For 
each processor P on this row, if P holds either Qi or Q 2 then P writes that 
packet on the bus. If P holds both Qi Q 2 , then P writes the SP, 

Phase 2 : One of the following four operations is selected: 

1. If Qi flowed on the bus in Phase 1, then all the processors move on to the 
next iteration, namely, for i T 1. 

2. If Q 2 flowed, then all the processors move on to the next iteration also. 

3. If nothing flowed then all the processors move on to the next iteration also. 

4. If flowed or collision occurred, then we need two more steps; the processor 
holding Qi writes it first and then the processor holding Q 2 follows. 

We have not yet stated when Stage 2 should be started. Our design of RR 
determined to start Stage 2 at some fixed step that is no later than 0.6875nTo(n). 
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At this moment all processors have finished Stage 1 with high probability (see 
the proof of Theorem 1). However, there is a slight chance that some processor 
is still executing Stage 1. If that is the case, then some unexpected data-collision 
will happen in Stage 2 and the algorithm fails. To improve it so as not to fail is 
possible but needs some technical details. For this purpose, we can use the SP 
again, which can also be avoided at the expense of extra o(n) steps (omitted). 

Theorem 1. With high probability^ RR can rout any instance within 1.4375nT 
o(n) steps. 

Proof. We first calculate the expected number of steps each stage takes. 
Then it is proved that the probability that RR takes essentially more steps than 
the average is very low. 

Stage 1-1. See Phase 2. The probabilities for the cases 1 to 4 are all the same, 
i.e., 1/4. Cases 1 and 2 take two steps and 3 and 4 take three steps. Therefore, 
it takes 2.5 steps for each block on average or 0.3125n steps for 0.125n blocks. 

Stage 1-2. Consider an arbitrary column in the upper-left subplane. It is 
not hard to see that each processor on this column holds an m-packet with 
probability 1 /2 and furthermore that this occurs independently between any 
pair of processors on this column. In Phase 2, the cases 1 to 3 take one step and 
the case 4 three steps. Hence we need 1.5 steps on average per block or 0.375n 
steps for 0.25n blocks. 

Stage 2. Let us calculate the probability of the cases 1 to 4 in Phase 2 where 
we have to be a bit more careful than before. 

Case a : Q± and Q 2 come from different blocks of Stage 1-1. Then one can 
see easily that the probabilities for 1 to 4 are the same, i.e., 1/4 for each. 

Case b : Qi and Q 2 come from the same block. Then we should check many 
different possibilities, such as coming from the top two positions, from the mid- 
dle two positions, and so on. However, it turns out that in any possibility, the 
probability for the case 4 is at most 1/4 (1/4 or 0, in fact). 

Recall that the cases 1 to 3 need one step and the case 4 needs three steps. 
Hence we need at most 1.5 steps on average per block or at most 0.75n steps for 
0.5n blocks. 

Now we shall evaluate the probability for bad behaviors using the Chernoff 
bound [Che52]: 

Lemma 1. Let Ai,A 2 ,...,A^ be independent Bernoulli trials having binomial 
distribution i^(n,p),0 < p < 1. Let X = L = E[A]. Then^ for any 

0 <£ < 1 , 

Pr[X > (l+e)/i] < exp(^-^). 

Stage 1-1. For a time being, we only consider some single bus. For block i, 
let Xi = 2 when the case 1 or 2 occurs and Xi = 3 otherwise. The X = '^ Xi 
has a binomial distribution i^(0.125n, 1/2), and p = E[A] = 0.3125n. Apply the 
Chernoff bound with e = ci Vnlnn//x for ci > 0. Then, 

Pr[ A > 0.3125n T Cl ynlnn ] < exp( — — cf In n) = 

1 . o 
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for some di > 0. Namely, the number of steps for Stage 1-1 is at most 0.3125n-h 
ciVmnn with probability 1 — 

Stage 1 - 2 . Our analysis is almost the same. Let = 1 when one step is 
needed and = 3 when three steps are needed, then the number of steps for 
this stage is at most 0.375n-h C 2 vrnnn with probability 1 — Thus Stage 1 
takes at most 0.6875n+ o(n) steps with high probability. 

Stage 2 . Let = 3 when the case 4 occurs (i.e., three steps are needed) 
and Yi = 1 otherwise. This time, however, Ti, T 2 r * * ? ^o. 5 n may not be totally 
independent. For example, if two packets heading for block 1 come from the top 
and third positions of some block (on the upper-right or the lower-left subplane) 
in Stage 1-1, then there is no possibility that two packets heading for block 2 
come from the second and fourth positions of the same block. Namely, if Ti = 3 
(with probability 1/4) then I 2 must be 1. However, we can show that the sum 
of those non-independent random variables does not deviate from its expected 
value with high probability using the following lemma [MR95]: 

Lemma 2 . Let Aq, Ai, A 2 , • • • he a martingale sequence such that for each 

\^k — ^k-i\ < c 

where c is some constant independent of k, Then^ for all t > 0 and any A > 0^ 
Pr[ \Xt - Xo| > AcVt ] < 2exp(-A72). 

Let Y = define a martingale sequence Aq, Ai, • • • , Ag.sn by 

setting Ao = Y[Y] and for 1 < i < 0.5n Xi = E[y|yi,l 2 , * * * ,y]- Now we shall 
evaluate the value |A^ — A^_i| for 1 < i < 0.5n. Note that the difference between 
Xi and Xi-i depends only on the value of y , which is determined by the behavior 
of two packets heading for the ith block. The behavior of each of those two 
packets affects the behavior of at most three other packets in its block of Stage 1 
and hence at most three other random variables Tj’s. Therefore, conditioning 
the value of y affects at most seven y’s (including y itself) and the difference 
between Xi and A^_i is bounded by some constant C 3 < 3 x 7 = 21 since y < 3. 
By applying Lemma 2 with t = n/2 and A = \/2ds In n for some ^3 > 0, one can 
conclude that the number of steps for Stage 2 is at most 0.75n T cs^/dsnhin 
with probability 1 — 2n~^^ . 

We have 2n buses. So, the probability that the bad behavior occurs in at 
least one bus can be as large as 2n times. However, since its probability can be 
written as n~^ for a large enough constant d, we do not have to worry about 
that. As a result, the whole algorithm takes at most 1.4375nT o(n) with high 
probability. □ 

Remark 1 . We can also design Stage 2 without using an SP : Phase 1 : 
If P holds both Qi and Q 2 , then P writes Qi £^^d then writes Q 2 at the next 
step. Phase 2 : 1 to 3 are the same. 4. If collision occurred, then we examine 
which case happened in the previous block. If it is the case 1, then the collision 
might be caused by the packet Q 2 of the previous block. So, we insert an extra 
step to send this Q 2 (but it may be idle if the collision is caused by two packets 
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of the current block). Although details are omitted, the algorithm runs in at 
most 1.46875n+ o(n) steps with high probability, which is a little worth than 
Theorem 1 but is still better than 1.5n. 

Remark 2. Local computation in each step is very simple in UR. That is 
obviously a good point of this algorithm compared to the next algorithm. 

4 1.25n + o(n) Randomized Algorithm 

Recall that there are n processors on each bus and roughly one half of these n 
processors put their packets on its bus in the previous algorithm RR. However, 
it takes much more time than 0.5n to finish the transmission. An obvious reason 
is a lot of packet collisions: If each processor F would know whether or not each 
other processor on the same bus is now trying to write its packet, then F could 
calculate a proper time-slot at which F should write its own packet without 
collision or waste of the bus. 

This is in fact possible if F would know all the random numbers the other 
processors have generated. To this goal, one can use the technique of generating 
many pseudo-random numbers deterministically from a few random numbers. 
Some preliminaries are needed: Let Ai, A 2 , • • • , Xn be discrete random variables 
defined on the same probability space. Such a set of random variables is said to 
be pairwise independent if for all i ^ j ^ 

VT[Xi = x\Xj = y] = Pr[ Xi = x ], 

This pairwise independence is naturally extended to the k-wise independence: 
A set of similarly defined random variables Ai,* • • , A^ are said to be k-wise 
independent if for all different ii, ^ 2 , ' ' ' Fky 

Pr[ Aq =Xi^ I A ^2 =Xi^,Xi^ =Xi^,--,Xi^ = Xi^ ] = Pr[ Aq = Xi^ ]. 

Lemma 3 (see [Jof74]). Let m he a prime number and Zm denote the field 
of integers modulo rn. Then the set of n < rn random variables Ai,***,An 
calculated by the following equation from the k numbers^ which axe 

randomly chosen from Zm owe k-wise independent: 

Xi — aii^~^ T + • • • + + cik mod m. (1) 

Consider for example the n processors Pi, • • • , on the first row. The left- 
most processor Pi generates k (truly) random numbers ui , • • • , and transmits 
them to P 2 through P^. Then each P^ can generate its own random number A^ 
by equation (1). The set of A^’s are guaranteed to be k-wise independent. The 
degree of randomness for each Xi is smaller than before, but we can show that 
it is enough for our purpose when k = 6. 

We have another technical problem; how to transmit ai, • • • , Recall that 
our rule is that only packets can be transmitted on the bus. Fortunately, the 
amount of information carried by ai, • • • , is not too large since is a constant. 
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Moreover, we can set rn to be nearly equal to n^, so the number of total bits 
of ai, • • • , Ujt is O(logn). Consequently the following simple algorithm works: (i) 
F\ creates ai, • • • , (ii) Suppose that the bit sequence of ui, • • • , (may be 
encoded) is 6 i, 62 , 63 , • • Then F\ puts its original packet repeatedly on the bus 
at time-slot i if 6 ^ = 1 and puts nothing if bi = 0. This takes only O(logn) steps. 

Now we are ready to give our new algorithm UUk which consists of two 
stages as before. In the first stage, about one half of the whole packets move 
horizontally to their column destinations, and the rest to their row destinations. 
The second stage is exactly the same as the previous algorithm HR. 

Algorithm: RRk 

Stage 1 . Choose any prime number rn > n^. Let Zm = {1^2, • • • ^rn — 1}, 

= {1, • • • , yI £^nd Z}^ = { Y + 1, • * * 7 ^ “ 1}' Note that \Zm\ = rn — 1 and 
m = (m — l )/2 (m — 1 is even since m is a prime number). 

Fhase 1 : Fi^i generates k prime numbers Ui, • • • , G Zr^ [k will be set to 
six when the probability of success is calculated). 

Phase 2 : Ti,i transmits ai, • • • , to all the processors on the first column 
in the way described above. 

Phase 3 : (1 < i < n) transmits ai, • • • , to all the processors on the 

Rh row in the same way. 

Phase 4 • Now each processor Pij has ai, • • • ,aj^, from which it computes 
f{hj) = o.i{n{i — 1) + j}^~^ + o. 2 {n{i — 1) + j}^~^ P ' ' ' P mod m. Then set 
Xij = 1 if f{i,j) G and Xij = 0 if f{i,j) G Z^. 

Phase 5 : If Xij = 1, then Pij puts its packet on the row bus first in the 
following way: Pij computes how many processors among • • • , also 

write to the row bus first by simulating Phase 4 of each processor (we need local 
computation proportional to n here). If that number is t, then Pij writes to the 
row bus at step t T 1. If Xij = 0, then Pi^j puts its packet on the column bus 
first. It uses the similar calculation to decide when it should do so. 

Stage 2 . Note that all the processors can calculate when the first stage ends 
(by calculating the last processor which accesses to the row or column bus at 
Stage 1). After the first stage is finished, all the processors enter Stage 2 that is 
exactly the same as Stage 2 of RR. One might think why we cannot use the same 
technique as Stage 1. Recall that there are approximately 0.5n packets on each 
bus at the beginning of this stage. If each processor knows the current positions 
or the destinations of all those packets on the bus, then we can use the same 
technique as before. Unfortunately, neither is known. 

Theorem 2 . For k = 6^ RRk halts within 1.25n T o(n) steps with high proba- 
bility. 

Proof. Note that Phases 1 and 4 include only local computation. Also, as 
mentioned before. Phases 2 and 3 take only O(logn) steps. Therefore what we 
have to prove is that the number of packets written to a row bus first (and 
to a column bus first also) is sufficiently close to 0.5n and furthermore, at the 
beginning of Stage 2, the number of packets to move on each single bus is also 
close to 0.5n. 
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Fix some row, say row j. Let Fi be the ith processor on this row and Xi be 
the random variable whose 1 or 0 is determined at Phase 4. Let X = 

Then it turns out [MR95] that the expected value of X is 0,5n. What we wish to 
know is the probability that X differs from 0.5n by a certain amount of value. 
Note that is a random variable that is a sum of (not necessarily independent) 
random variables for which we cannot use Chernoff bound. Instead we use the 
(generalized) Chebyshev bound: For an integer A: > 2, let /x^ = E[(X — 0.5n)^]. 
(This is called kth central moment, which does not exist for some probability 
space. It obviously exists in the present case.) 

Lemma 4 (see [MR95] for example) . For any t > 0^ 

Pr[ - 0.5n| > t\J~Xc] < 

In order to prove this theorem, it is enough to consider only the case where 
k = 3 (the reason will be described later). Let us evaluate the third central 
moment jafj = E[(t/ — E[L])^], where U = ^ is the sum of n 3- wise indepen- 
dent binary random variables. Expand /x^ = E[(^ — E[t/^])^], and consider 

each term. Such a term involves up to three variables from L^’s. However, we can 
claim that terms involving more than one variable cancel each other. To see this, 
let T be a term of the expansion that involves more than one variable. Then, T 
contains a variable Ui that appears in T exactly once. Thus, 7’ can be written in 
the form E[FiUi] or E['i 2 E[t/^]], where T± or T 2 does not contain Ui. Note that 
the terms in these two forms (for fixed Ui) are in one to one correspondence, 
each E[TiUi] corresponding to E[—TiE[Ui]]. Due to the 3-wise independence we 
can write EpiLz] E['ii]E[t/^] and E[— 7iE[t/^]] as — E[ii]E[L^] and thus cancel 
them out. The only remaining terms are of the form E[U^], E[t/?]E[t/^], or E[t/^]^, 
involving only one variable, with some constant coefficients. The contribution of 
these terms is a constant per variable and thus 0(n) in total. 

Note that the 6-wise independent random variables X^’s satisfy 3-wise inde- 
pendency. Using Lemma 4 with ji\ = 0(n), 

Pr[ |X — 0.5n| > t^O[n) ] 

If we set t = for a positive constant c, then 

Pr[ |X — 0.5n| > 0[n^) ] < 

This means X < 0.5n T o(n) with probability at least 1 — Then it 

follows that the number of steps needed in Stage 1 for the fixed single bus is at 
most 0.5n T o(n) + O(logn) with at least the same probability. This holds for 
all other buses. 

Eor the analysis of Stage 2, we can apply the same argument as above since 
random variables U’s are 3-wise independent. By the above calculation we can 
get /Xy = 0(n), i.e., 

1 



< 






1 



.3c-l ' 



Pr[|F— 0.75n| > 0{n^)] < 
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which says Stage 2 takes at most 0.75n + o(n) steps with probability at least 
1 — per bus. 

Since there are 2n buses, the unsuccessful probability can be up to 2n times. 
However, we can still get a sufficiently small probability by setting | < c < 1. 
As a result, RRk requires 1.25n-h o(n) steps with high probability. □ 

5 Concluding Remarks 

It is known that routing can be done in l.On steps if all the processors know 
the source and destination addresses of all the packets (so-called off-line rout- 
ing) [1MK96]. This l.On is also an absolute lower bound. The question is how 
close we can be to this bound in the normal routing. Further improvements may 
be possible for Stage 2 of both RR and RRk and for the local computation in 
Stage 1 of RRk- Also an interesting question is whether we can apply our ran- 
domization technique to improve the upper bound for the mesh equipped with 
both buses and local links. 
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Fig. 2. Two packets written on the row bus in Stage 1-1. 





Fig. 3. The remaining packets written on the column bus in Stage 1-2. 
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Abstract. In real practice, a job sometimes can be divided into s inde- 
pendent tasks to be distributed for execution on a network with a fixed 
number of processors. The overall finish time can vary widely depending 
on variables such as latency, data partitioning and/or data combining 
times, the individual execution times, the amount of data to be trans- 
ferred, and the sending out of more tasks than needed. This paper studies 
the problem of finding an optimal task scheduling for a divisible job such 
that the overall finish time is minimized. 

We first prove the studied problem is NP-complete and give a sim- 
ple 3-0 PT approximation algorithm. Then we develop a (2 + e)-OPT 
linear-time approximation algorithm by generalizing our simple algo- 
rithm, where e is an arbitrarily small constant. A linear-time 2-0 PT 
approximation algorithm is given when we divide the tasks evenly. Al- 
gorithms to find optimal solutions are then given for two special cases: 
1) when the network has exactly two processors and 2) when the evenly 
divided tasks have symmetric behaviors. These cases happen frequently 
in real practice. 



1 Introduction 

This paper studies how to solve a problem that needs intensive computational 
time on a network with a fixed number of processors. Such a problem can often 
be divided into a set of tasks with precedence constraints and communication 
overheads [3,13,16]. This paper studies the case when the task graph is a directed 
graph with one source, one sink and with every other vertex having an incoming 
edge from the source and an outgoing edge to the sink. Jobs that can be repre- 
sented using this type of task graph frequently happen in practice and include 
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matrix multiplication [17], and imaging, signal or pattern matching algorithms 
using hierarchical tree data structures (e.g., quad-trees) [6,15,19]. 

To model a network of processors, this paper uses a model proposed in [11]. 
This model is designed to represent a communication network of computers or 
workstations each with its own memory and microprocessor. A pipe-lined mes- 
sage sending (and receiving) cost with latency is used which is proportional to 
the message size. In such an environment, communication times are significant 
when tasks are located on different processors. The growth of such networks 
mandates more study into the efficient use of their parallel computing power. 
More importantly, the model we use is general enough to be used for any al- 
gorithm which can be represented as a set of tasks which communicate with 
each other and whose execution and communication costs are known or can be 
estimated. An example where such an algorithm would be helpful is a network 
of computers using PVM parallel software [8]. 

If the task precedence graph of the problem is represented as a one-level 
directed out- or in-tree, results are shown in [11,12]. 

Previous work on this problem was based on different models of parallel 
computations and some of them were very theoretical in nature [1,2,4,5,14,18]. 
This paper concentrates on solving problems on a network with a fixed number of 
processors. Previously, approximated solutions are given when the intermediate 
tasks have different execution times and for task graphs that are one-level trees 
[11,12]. We improve both the approximation factor and the running time of the 
approximation algorithms presented in [12] by relating existing approximation 
results to ours. We further notice that in real applications the tasks fanning out 
have regular structures, i.e., their execution times are equal (e.g., the matrix 
multiplication problem). In many cases, the amount of data to be sent out and 
collected is also the same, which implies that the inter-processor communication 
times for tasks are likely to be the same. In this paper, we also show exact 
solutions under these constraints. 

2 Preliminaries 

2.1 Notation 

In this paper, we use the following notation. 

1. Let a job J — {to, ti, . . . , t^} be a set of tasks whose precedence constraints 
form a one- level send-received graph PC(J7). 

2. Given a task G, let be its execution time. 

3. Given two tasks G and tj with the precedence constraint that G must be 
executed before tj can be executed, let Ci^j be the communication time from 
ti to tj if they are not executed on the same processor. The communication 
time is 0 if G and tj are executed on the same processor. All data streams 
are transmitted in a pipelined fashion, i.e., after ti starts sending, all data 
arrives at tj in Cij T L units of time. If a task needs to send or receive two 
data elements at the same time, the two I/O operations must take place in 
sequence. 
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4. We schedule J on uniform processors ^ r < with the system 

I/O latency, L. We assume the job starts at Fq. 

Given a one-level send-receive graph, the number of intermediate vertices is 
called its fan-out. We define the tasks represented by intermediate vertices inter- 
mediate tasks. The tasks represented by the starting vertex and the terminating 
vertex are called the starting task and the terminating task, respectively We 
call a job divisible if its task precedence graph is a one-level send-receive task 
graph. A divisible job is evenly divisible if the execution times of all intermedi- 
ate tasks are equal. Given two intermediate tasks in a one-level send-receive task 
graph, they are symmetrieal if their execution times and communication times 
are equal. A one- level send-receive task graph is symmetrieal if all tasks repre- 
sented by intermediate vertices are symmetrical. A divisible job is symmetrieally 
divisible if its task precedence graph is symmetrical. A scheduling S', for ff ^ is 
an assignment of tasks to processors. 

A legal realization for S is the assignment of starting times for all tasks 
allocated to each processor such that it satisfies the precedence constraints and 
the I/O latency requirement. We also consider only the realization that if a 
processor has something waited to be done, then it will do it instead of idle. 
Note that a processor can be idle at some time if the processor is waiting for 
tasks allocated on other processors because of the precedence constraints. The 
makespan of a processor Fi for a realization is the time at which the processor Fi 
finishes the execution, including the idle time and the communication, of all tasks 
allocated to it. The makespan of a legal realization is the largest makespan among 
all processors. The makespan of a divisible job is always equal to the makespan 
of Fq. A legal realization with the smallest makespan is a best realization. The 
makespan of scheduling S is the makespan of its best realization and is denoted 
as M{S). An optimal seheduling is a scheduling with the smallest possible 
makespan. We further define OPT/^(J7) to be a scheduling for J to be executed 
on k processors with the optimal makespan. Hence M (OPT/^(J7)) is the optimal 
makespan. Note that to and t^+i are allocated to Fq. 

An example of a realization for a scheduling on a divisible job J = {to, . . . , ts} 
is shown in Figure 1. 
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Fig. 1. A realization for a scheduling on the right for the send-receive task graph 
on the left. Assume that tasks to, to and are allocated on Fq. Tasks ti, t 2 and 
t 4 are allocated on processors other than Fq. 
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2.2 NP-complete Results 

Lemma 1 The problem of finding an optimal seheduling for a divisible job on 
a network with k proeessors is NP-eomplete if k > 1. 

Proof. Assume L = 0, eg = = 0, and co^i = = 0 for all 1 < i < n. 

Then this problem is reduced to the problem of finding a best makespan on k 
identical processors without precedence constraints [7, Problem SS8], which is 
NP-complete even when k = 2. 

Lemma 2 [12] The problem of finding an optimal seheduling for an evenly di- 
visible job on a network with k proeessors is NP-eomplete if k > 2. 

3 Divisible Jobs 

This section presents approximation algorithms to find an optimal scheduling for 
a divisible job on a network with fixed k processors. The problem is NP-complete 
by Lemma 1 for any fixed k>2. Previously, an 0(n* logn)-time (4— l/k)-OPT 
algorithm is presented in [12]. 

3.1 A Simple 3-OPT Approximation Algorithm 

Let = {to, Wi} U (N I 1 < t < n and Coy + > e^}. Let = {U \ I < 

i < n and < e^}. Hence J' and J'' is a disjoint partition of J . 

Lemma 3 

1. M [OPT k{iJ)) A Co + T 

2. An optimal seheduling for J is to sehedule all tasks on Pq if and only if for 

all tasks p G T 2-L T T co^i. 

3. Assume that exeeuting all tasks on Pq is not an optimal seheduling for J . 

Then M[OPTk{J)) > eo + e^+i + 2 L + e^. 

Proof. Part 2 is from [12]. We first prove part 1. In any optimal scheduling for 
all tasks in J' must be allocated on Tq, since otherwise we could reschedule 
them on Pq giving a smaller makespan. Note that cq. + if p G 

Among all possible schedulings, the smallest possible makespan for Pq is thus 
eo + e^+i + 

We now prove part 3. Given an optimal scheduling, let p be a task allocated 
on Tj, j > 0. Then the makespan of Pj is at least eo + + 2 L + e^. 

We define a valid k partition for J to be the partitioning of intermediate 
tasks in into k disjoint subsets. Given a valid k partition, the value of a 
subset in a partition is the sum of execution times of tasks in the subset. The 
largest value of a valid k partition is the largest value of subsets in the partition. 
We also pick an arbitrary subset in the partition with the largest value to be 
its largest subset We further define Uk^j to be the smallest largest value of all 
possible valid k partitions. 
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Lemma 4 M{OFTk{J)) > Uk^j > max{maxi,^j// 

Proof. In any optimal scheduling for tasks in U are executed 

on Fq. Tasks in J'' are distributed among the k processors, which form a valid 
k partition. The makespan of any scheduling is hence at least the largest value 
of the corresponding valid k partition. Thus M (OPT/^(J7)) > Uk^j- The proof 
that Uk^j > max{maxi-^ ^ijkk\ is straightforward. 

We first prove two lemmas that are needed to derive our approximation 
algorithm. 

Lemma 5 There is a linear-time algorithm to find a a valid k partition for J 
whose largest value is at most 2-Uk^j; 

Proof. Let Q = max{maxi,^ j// 

Algorithm VKP(^7,Q) /* Finding a valid k partition. 

1. Let sets = • • • = 82 k = 0 ; current := 1, full := Q\ 

2. for i = T n and p G do 

(a) while T W. > full do 

' ^ ^ — ‘ ^ ^ cur Trent 

current := current + 1; 

(b) S current • ^current bJ 

3. return Ai U A 2 , S'a U 5^4, . . . , S 2 k-i U 82 k as a valid k partition. 

4. end; 

It is straightforward to see that the returned sets are a valid k partition and 
that the algorithm runs in linear time. If the value of current in Step 2a is at 
most 2*/^, then the largest value of the returned valid k partition is at most 2-Q, 
which is at most 2-Uk^j. We now prove the value of current in Step 2a is at 
most 2-k. Note that the value of UN^+i, 1 < i < current^ is at least Q because 
of Step 2a. Hence if current > 2*/^, then "l 2 t-eS 2 ' 10 ^ 2 - ^ 

which is a contradition. Hence the lemma is true. 

Lemma 6 In Algorithm YKP , 2-M(OPTft(J))>2-L+maxy2Et,e‘?,,_iU^?.V^- 

Proof. Let 8 be an optimal scheduling whose corresponding valid k partition 
on tasks in is {itg, . . . , Pk-i}: where Ri is the set of tasks in that are 
allocated on Pi. Let Vi be the value of Ri. Let W = max^^^*^ T^. Note that 
M(OPTa;(J)) > max{2-L + W, Vq}. By Lemma 5, maxy 2 J2tieS2j-iU6’2j - 
2-Q By the argument of the value of some subset in a partition must be at 
least its mean value, W > QifFo ^ Q- IflF > Q, then our lemma follows 
from Lemma 3(3); otherwise, we have the following cases. Note that we assume 
IF < Q* Hence Fq > Q- We want to prove (2-L T 2-Q)/ niax{2*L + IF, Fq} < 2. 
Hence it suffices to prove (2-L + 2-Q) /(2-L + IF) < 2 or (2-L + 2 -Q) /Fq < 2. 
Case 1: L > Q. Hence (2-L + 2-Q)/(2-L + IF) < 2. 

Case 2: L < Q and Fq > 2-Q. Hence (2-L T 2 -Q)/Fq < 2. 

Case 3: L < Q and Fq = Q + where 0 < 4 < Q. Also by the argument 
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of the value of some subset in a partition must be at least its mean value, 
IV > Q-S/(k- 1). 

Case 3.1: L > S/(k - 1). Then (2-L + 2-Q) /(2-L + IT) < 2. 

Case 3.2: L < S/(k - 1). Then (2-L + 2-Q) /(Q ~h S) = (2-L + 2-Q) /To < 2. 

Theorem 7 There is a linear-time 3- OPT approximation algorithm for the 
problem of finding an optimal seheduling of a divisible job on a fixed k, k > 1, 
proeessors 

Proof. Our algorithm works as follows. 

Algorithm KD(J7) 

1. Let be the scheduling with all tasks allocated on Lq; 

Let be the makespan of S'/ 

2. (a) Find a valid k partition V for J whose largest value is at most 

(b) Let S^^ be the scheduling by allocating tasks in {toOn-\-i} C and 
the largest subset in V on Pq^ and the rest of the subsets one each 
on Pi^ 1 < i < k] 

(c) Let be the makespan of an arbitrary realization of S''] 

3. if M' > M" y then return S' else return S"; 

4. end; 

Using Lemma 3(2), Step 1 can be implemented in linear time. Let Q = 
mnx{mnxt,^jn ei,Y.tieJ" Lemma 4, Q < M (OPT/,( J)). By 

Lemma 5, Step 2a can be done in linear time. 

We then prove the approximation bound of Algorithm KD. We assume that 
S' is not an optimal scheduling; otherwise, our algorithm has found the optimal 
scheduling. 

Assume wlog that Si U S 2 is the largest subset in V. the makespan of Lq T 
either one the the following: 

eo + + y ^ + E'.+ E '■ (1) 

ti^g ^ ti^S\OS2 

60 T ^n+i T ^ ^ T + 2 -L T E '■ c^) 

Note that M[OPTk{J)) > By Lemma 5 and Lemma 4, — 

2-M(OPT/,(J)). By Lemma 3(1), eo + e^+i + Xl^ ^0,i T T C T 

M{OPTk{J))- Hence Equation 1 < 3-M(OPT/^(j7)). By Lemma 6, Equation 2 
is also < 3-M(OPT/^(j7)). 

Hence the theorem is true. 

3.2 Generalizations 

The problem of finding a valid k partition with the largest value Uk^g is equiv- 
alent to the problem of scheduling n independent tasks on k identical machines 
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without any communication costs or any partitioning and combining computa- 
tion. This problem is well-studied [2,5]. 

It is well-known that computing Uk^j is NP -complete since if we set k = 2^ 
L = 2, Co = = 0, and all communication times to be zero, then it reduces 

to the partition problem. Hence we give a linear-time 3-OPT approximation 
algorithm in Theorem 7. We remark that the 2- OPT approximation algorithm 
to approximate Uk^j is first described in [9]. From the proof of Theorem 7, we 
observe the following. 

Lemma 8 Let J he a divisible job. If there is an approximation algorithm for 
finding a valid k partition whose largest value is within ri times Uj^^j in 0{ti) 
time, then there is an 0{ti -\-n)-time algorithm to find a scheduling for J whose 
makespan is within ri + 1 times M{OFTk{J))- 

Proof. Straightforward from the proof of Theorem 7, especially Lemma 6. 

Note that Lemma 8 says that better approximation algorithms lead to a better 
approximation algorithm for our problem of finding an optimal scheduling for a 
divisible job on k processors. 

Theorem 9 There is linear-time approximation algorithm for finding a schedul- 
ing of a divisible job J on k processors whose makespan is arbitrarily close to 
2-M(OPT^(J)). 

Proof. From Lemma 8 and from the fact that there is an polynomial approx- 
imation scheme that runs in 0(ce*n)time for finding a valid k partition whose 
largest value is within [1 e)LJk^jj where 0 < e < 1 and = 0(2^^^^) [10]. 

4 Evenly and Symmetrically Divisible Jobs 

4.1 Evenly Divisible Jobs 

This subsection studies the optimal scheduling problem for an evenly divisible 
job on a fixed number k of processors. Lemma 2 shows that this problem is 
NP-complete if k > 2. 



Algorithm for Two Processors We solve the problem of finding a best 
scheduling for an evenly divisible job on a network of two processors Pq and 
Pi. Without loss of generality, assume that cgy + 

1 < i < n — 1. 

Lemma 10 Let Syj be a scheduling of J with exactly w intermediate tasks sched- 
uled on P\. Given any w, among all possible selections of S^j, the one which 
schedules tasks ti , . . . , on P\ has the least makespan. 
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Proof. Given any let be the set of tasks allocated on Pi and let 
be the set of intermediate tasks allocated on Pq. The makespan of is either 
^0 + (^04 + (ri — tc) *e T or eg + 2*L T G,?i+i) + 

tc-e T e^+i. In both cases, the value (^04 + Ci,n+i) is minimized if 'ih? = 

{ti, . . . , Hence the lemma holds. 

Theorem 11 There is a linear-time algorithm to find an optimal seheduling for 
an evenly divisible job on two proeessors when the total eommunieation times of 
intermediate tasks are sorted. 

Proof. Our algorithm is based on Lemma 10. Let = {ti, . . . ,t^}. Let be 
the schedule realized by allocating the tasks in on Pi and everything else on 
Pq. We first find the makespan for Sf by allocating all tasks on Pi except for 
task to- Then we incrementally obtain the makespan of Sf^_i from what we have 
for computing the makespan of 5*. We continue removing the largest task from 
Pi and placing it on Pq until we have obtained the makespan of Sq. An optimal 
scheduling is the minimum makespan of A*,0 < t < n, found above. 



Algorithm for any k '> 2 Processors 

Theorem 12 There is linear-time 2- OPT approximation algorithm to find a 
seheduling for an evenly divisible job J on a network with k proeessors whose 
makespan is at most 2-M{OPTk{J))- 

Proof. Since the execution times of non- root tasks are all e, each processor must 
spend at least \n/k~\-e time executing tasks. It is straightforward to find a valid 
k partition whose largest value is \n/kfie. Hence this theorem follows from 
lemma 8. 

When the number of available processors in the network is fixed, the problem 
is NP-hard by [12]. We first give a 3-OPT approximation algorithm. Note that 
the difference di equals We also assume for now that di > 0, 

1 < i < n. 



4.2 Symmetrically Divisible Jobs 

We first present needed notations. Let <S be a scheduling for a symmetrically 
divisible job executed on a network of processors Tq, • • • , Pk-i^ Assume that the 
starting and terminating tasks are executed on Pq. 

Lemma 13 A best realization of S has the following properties. 

— Pq first exeeutes the starting task, then spend time to send out data for tasks 
not alloeated on Pq. Finally, Pq exeeutes intermediate tasks alloeated on it, 
eolleets data sent from other proeessors, and then exeeute the terminating 
task. 
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— Processor Pi, i > 0, first receives data sent from Fq and then immediately 
executes the first allocated task. It then sends output data back to Fq and 
waits to receive data for the second allocated task sent from Fq. After the 
data is received, the second task is executed. The above continues until all 
tasks are executed. 

Proof. Straightforward from the fact that all intermediate tasks are symmetric. 

Assume that w tasks are not allocated on Fq and data for those tasks are 
sent out in the order of Given a best realization for S as described in 

Lemma 13, we define its ith snap shot, 1 < i < w, being an assignment of time 
to processors Pi, i > 0. The time assigned for each processor is the earliest time 
that this processor finishes executing tasks, including its communication time, 
in {t[, . . . ,t^} that are allocated on it. The time assigned in the ith snap shot 
for processor Pj, j > 0, is called the available time for Pj. 

Lemma 14 There exists an optimal scheduling for a symmetrically divisible job 
whose best realization satisfies Lemma 13 and an intermediate task t[, 1 < i < w , 
is allocated on the processor with the earliest available time in the (i — l)th snap 
shot. 

Proof. Let S be an optimal scheduling. Let r be the least integer larger than 
1 such that is not allocated accordingly. Assume that tj. is allocated on Pu 
whose available time is later than P^. We obtain from S an scheduling S' such 
that the makespan of is no larger than the makespan of S using the following 
two rules. If no task in allocated on P^, then S' is the same 

as S except that t'^. is allocated on P^. Otherwise, let t'^., s > r, he the first 
intermediate task that is allocated on P^ after the (r — l)th snap shot is taken. 
Then S' is the same as S except that t'^ is allocated on P^ and is allocated 
on Pu. Because all intermediate tasks are symmetrical, the makespan of S' is no 
larger than that of S. 

Theorem 15 The optimal scheduling of a symmetrical one-level send-receive 
task graph to be executed on a network of k > 1 processors can be found in linear 
time. 

Proof. Since all intermediate tasks are symmetrical and by Lemmas 13 and 14, 
an optimal scheduling can be found if we know the the number of tasks that 
are not allocated on Pq. Assume that the set W of tasks not allocated on Pq is 
{t'l, . . . In order not to keep track of the available time of each processor in 
every snap shot, we use the following strategy. We allocate task t'-, 1 < i < w, 
on processor Pi j^od (Aj-i)* Since all tasks in W are symmetrical, the available 
time for processor Pi j^od (k-i)y ^ > 1, is the earliest in the [i — l)th snap shot. 
There are up to m possible values for w, where rn is the number of intermediate 
tasks. However, we first assume w = rn. Let pi be the latest available times for 
the ith snap shot. Using pi, we can compute the makespan on Tq when there are 
exactly i intermediate tasks not allocated on Pq. 
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1 Introduction 

In this paper we study isomorphism between circulant graphs. Such graphs 
have a vast number of applications to telecommunication network, VLSI de- 
sign and distributed computation [4,13,15,17]. By suitably choosing the length 
of the chord between two nodes of the network, one can achieve the appropriate 
property: e.g., low diameter, high connectivity, or implicit routing. A network 
that does provide labelled edges should be able to exploit the same properties 
as one with different labelling if the underlying graphs are isomorphic. 

For general graph the problem is known to be in NP, not known to be in P, 
and probably is not NP-complete, see Section 6 of [3]. It has been conjectured 
by Adam [1] that for circulant graphs there is a very simple rule to decide the 
isomorphism of two graphs. Although this rule is know to be false in general, 
even for undirected graphs [9], for several special cases it holds [5,15,18,19,20]. 

The purpose of this paper is to extend essentially the class of graphs having 
the Adam property and even more general spectral Adam property as well as 
to introduce a new technique, which is based on some number theoretic results 
about equations in roots of unity, which we believe can be applied to several 
other questions of graph theory. In particular, we believe that our method can 
be applied to study general Cayley graphs, see [20]. Indeed, at least in the case 
of Cayley graphs generated by an abelian group, the corresponding eigenvalues 
are linear combinations of group characters, that is they are linear combinations 
of roots of unity, see Section 3.12 of [3] or Lemma 9.2 of [12]. 

The new approach, which we propose here, is based on the combination of the 
spectral technique from [8,9,15] with some deep results of algebraic number the- 
ory on linear equations in roots of unity [6,10,16,21,22,23]. In fact it can be 
extended to weighted circulant graphs as well. In particular, we settle the afore- 
mentioned Adam conjecture [1] for a wide class of circulant graphs which are 
not covered by the previously known results. 

We recall that an n- vertex eireulant graph (C is a graph whose adjacency matrix 
A = is a circulant. That is aij = = 1, . . . ,n. Hereafter, 

subscripts are taken modulo n. We assume that = 0, i = 1, . . . , n. 

Wen-Lian Hsu, Ming- Yang Kao (Eds.): COCOON’98, LNCS 1449, pp. 251-260, 1998. 
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Therefore with every circulant graph one can associate a set S C of the 
positions of non-zero entries of the first row of the adjacency matrix of the graph. 
Respectively we denote by (S' the corresponding graph. We also recall that two 
graphs (Si, G 2 are isomorphic^ and write (Si (S2, if their matrices differ by a 
permutation of their rows and columns. We say that two sets S, T C are 
proportional^ and write S ^ if for some integer I with gcd(/,n) = 1, such that 
S = IT where the multiplication is taken over Obviously, S T implies 
{S)n - {T)n. For example (S = {1,5}, T = {1,9}, and n = 23), (S)^ - (7% 
since S ^ 7' (/ = 5). 

Adam [1] conjectured that the inverse statement is true as well. We say that a set 
S C 2Zn has the Adam property if for any other set T C 2Z^ the isomorphism 
{S)n — {T)n implies the proportionality S ^ T. Thus the Adam conjecture is 
equivalent to the statement that all sets S C have Adam property. 

The discovered in [9] counterexample of 6-element sets S^i = {±1,±2,±7} C 
^167 ^2 = {=tR=th,±7} C 2^16 shows that the Adam conjecture is false. It is 
easy to verify that the isomorphism (Ai)i6 — (*S'2)i6 is furnished by the permu- 
tation on 2^16 given by 

J — 5i, if i = 0 (mod 2); 77 

^ ^ { — 5i — 4, if i ^ 0 (mod 2); ^ ^ 

but S\ rf In fact, counterexamples exist for any values of n except, maybe, n 
of the form n = 2"3^m, where a G {0, 1, 2, 3}, /? G {0, 1,2}, gcd(m, 6) = 1 and 
rn is squarefree [2]. 

Nevertheless, there are several very important families of circulant graphs for 
which the Adam conjecture holds. For example, Muzychuk has shown that the 
Adam conjecture is true for circulant graphs with a squarefree number of ver- 
tices [18] and with a twice squarefree number of vertices [19]. It also holds for 
4-element sets S [7,11], see also [14]. The corresponding graphs are known as 
double loops are have many applications to computer science. In fact it has been 
discovered in several paper that, under some additional restrictions, the isomor- 
phism property of graphs can be replaced by the property of their isospectrality. 

We recall that the spectrum Spec (5^ of a graph G is the set of the eigenvalues 
of its adjacency matrix. In particular, isomorphic graphs have the same spectra 
(although the inverse statement is obviously false). Respectively, we say that a 
set S C has the spectral Adam property if for any other set T C the 
isospectrality Spec S Spec 7' implies the proportionality S . 

Here we describe a general class of sets having the spectral Adam property. For 
example, it is shown in [15] that any 4-element set S = {±l,±d} C (an 
important sub- family of double loop circulant graphs), has the spectral Adam 
property, provided that 2 < d < min{n/4, cp(n)/2}, where (f{n) is the Euler 
function. Here we settle the question almost completely, with only 5 possible 
exceptions, for this type of graphs; for all of them d > n/6. For more general 
sets of the form S = {±a, ±6} there are at most 12 possible exceptions. Moreover, 
if gcd(n,12) < 3 then there is no exceptions at all. We also show that for any 
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fixed m the probability that a random m-element set S C does not have the 
spectral Adam property, is 

2 Auxiliary Results 

Let us consider the equation 

k-i 

ao + wi , . . .,Wk-i e (1) 

i=i 

where ( = exp(27ri/n) and uq, . . . , cik-i are nonzero integers. 

We call a solution (tci, . . . ^Wk-i) of (1) irreducible if '^j^j ^ 0 for any 

proper subset J C {1, . . . , A: — 1}. 

Such equations and their various generalizations have been studied in the liter- 
ature a great deal [6,10,16,21,22,23]. We summarize the results of [6,16] in the 
following lemma. 

Lemma 1. For any irreducible solution of the equation (1), the fraction 

Q = - 

gcd(n,w;i , . . . ,Wk-i) 

is squarefree and 

Y^{p- 2) <k-2, 

p\Q 

where the sum is taken over all prime divisors of Q. 

In particular, one can see that Q = 1 if prime divisors of n are greater than 
k and that Q < 2 if n is a power of 2. Let us denote 

Qk = max < hi P I - 2) < A; - 2 I 

K p p ) 

where both, the product and the sum, are taken over distinct prime numbers. 
Thus for the quantity Q of Lemma 1 we have Q < Qk^ 

From the known results on the distribution of prime numbers (see [6,22,23]) one 
easily derives that Qk < exp ^(1 + o(l)) k^^‘^ log^^^ . 

It is easy to verify that for S C Spec {S)n = | / = 0, 1, . . . , n — 1 1 . 

The following result is based on the previous representation and provides a 
connection between circulant graphs and equations roots of unity. It lies in the 
background of the approach of [15] (see the proof of Theorem 4 from [15]). 
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Lemma 2. Let S', T C Z^Jn be such that Spec {S)n = Spec {T)n hut S T . Then 
there exists I, \ <l < n — such that the polynomials 

F{X) = 

ses teT 

is not identical to zero modulo — 1 and F{() = 0. 

We remark that in other words the polynomial does not vanish if one replaces 
the exponents /t, t G by its smallest positive residues modulo n. 



3 General Estimates 

Let us denote 



[1 ^-1 ) 

Pm = mm max < ^ . 

Obviously pm F 1/Q2m thus we have the asymptotic inequality 

Pm > exp + o(l)j log^/^ rnj , 

which apparently can be shown to be tight in the sense logp^^ ^ — logQ 2 m 7 
ni ^ oo. However, for smaller values of rn one can obtain better than pm > 
1/Q2m numerical estimates. For example: 



rn 


2 


3 


4 


5 


6 


7 


8 


9 


10 


pm 


1/6 


1/15 


1/15 


2/35 


1/42 


2/105 


2/105 


4/231 


3/770 


1 ! 0/2m 


1/6 


1/30 


1/42 


1/70 


1/210 


1/210 


1/330 


1/462 


1/2310 



Theorem 3. Let S = {si, . . . , Sm} O be an rn-element set which does not 
satisfy the spectral Adam property. Then 

max \si — Sj \ > pm'!^- 



Proof. From Lemma 2 we conclude that for some subsets U C S and V Q T 
with U n V = 0 we have 

Ec“-E<" = »- 

ueu vev 



We split this equation into the maximal possible set of r, m > r > 1, subequa- 
tions 



where U = U;=i .V = U;=i and R K 



0, 1 < // < i/ < r. 
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Let R be the set of fi = 1, . . . , r for which > 2. We put 
L=^R and M = 

fj,(z R 



Because = 1 is not possible, // = 1, . . . ,r,we see that there are 

r — L — rn — M values of /x = 1 , . . . , r with fi ^ R and > 2 for each of such 
fi. Therefore for any fi ^ R 

< m - L - 2(m - M) + 1 = 2M - m - L + 1 (2) 



Denote A = maxi<^<j<^^ \si — Sj\, 

First of all let us select a pair (t/^, F^), fi ^ R^ for which the total cardinality 
N = + #F^ is minimal. Select two arbitrary distinct elements wi,U 2 C U 

with 0 < |ni — U 2 | < A. Dividing out the corresponding equation by u\ and 
applying Lemma 1 we obtain A > n/Q • 

Now we select select a pair for which the total cardinality K = is 

maximal. Then the selected subset C 5 contains at least one pair ui^U 2 G U 
with 0 < |ni — U 2 | <{A— ^)/{K — 1). Dividing out the corresponding equation 
by u\ and applying Lemma 1 we obtain that 



^ > 



n{K - 1) 
Q 2 M+K — m— L+1 



We have K > M/L^ thus L > \M/K~\ and we derive 



Q 2 M+K — m— L+l < Qm+K-\M/K] +1 ^ Q 7n-\- K — \ 771 / -\-l • 

We also remark that N < 2M/L < 2K. Combining the above inequalities we 
obtain the desired estimate. □ 



Similar arguments show that if the smallest prime divisor of n is greater than rn 
then the spectral Adam property holds for all m-element sets S = {si, . . . , s^} ^ 
Zn. It also easy to see that in the most interesting for applications case when 
n is a power of 2 sets S = {si, . . . , s^} ^ satisfy the Adam property if 

max \si — Sj \ < n/2. 

l<i<j<77l 



Denote by Am{R) the number of m-element sets S = {si, . . . , s^} ^ ^77 which 
do not satisfy the spectral Adam property. 

Theorem 4. For any fixed rn, the bound Am{R) = holds. 
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Proof. Let S = {si, . . . , Sm,} ^ be an m-element set which does not satisfy 
the spectral Adam property. We use the the same notations as in the proof of 
Theorem 3. Then for every pair wi,W 2 G we have gcd(|wi — W 2 |,n) > n/Q 2 m' 
Therefore there are at least 



E 






2 



11 > ^ 

^=1 



M/2 > m/4 



pairs ui^U 2 G A, ui A ^ 2 , with gcd(|ui — U 2 \^n) > n/Q 2 m^ ff is easy to see that 
there are at most m-element sets S C satisfying this condition. 

□ 



Obviously, this bound can be slightly improved as Ami'n) = O (n^ ^ 



4 Double Loops 



In this section we concentrate on double loop circulant graphs. That is consider 
circulant graphs generated by sets S = {±a,±6} G Z^Jn with the condition that 
1 < a < b < n/2. We restrict ourselves with connected graphs which is equivalent 
to the solvability of the congruence ax-\-by = 1 (mod n), thus to the condition 
gcd(a,6, n) = 1. We define the following 4 special families of graphs 

We = {±e, ±(n/2 — e)}, Xe = {±e, ±n/4}, 

Ye = {±e, ±(n/3 — e)}, Ee = {±e, ±(n/6 — e)}. 

First of all we need the following two simple lemmas which are based on elemen- 
tary number theory 

Lemma 5. Let integers 1 < a < 6 < n/2 he sueh that gcd(n, a, 6) = 1. If either 
n/gcd(n,a+6) or n / gcA{n,b— a) divides 6, then the set {±a,±b} is proportional 
to one of the following 12 sets 

Wi] Ye^ e = dzl,3; Ee, e = dzl, dz2, dz3, dz6. (3) 

Lemma 6. Let integers 1 < a < h < n/2 he sueh that gcd(n,a,6) = 1. If both 
n/gcd(n,2a) and n/ gcd{n,2b) divides 6, then n\12. 

Theorem 7. Any set S = {±a,±6} C with gcd(a,6, n) = 1 exeept possibly 
one of the following 12 exeeptional sets 

Xi] Ye, e = ±l,3; Eg, e = ±1, ±2, ±3, ±6; 

satisfies the spectral Adam property. 
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Proof. First of all we prove the theorem assuming that S is neither one of the 
sets (3) nor one of the sets Ae, e = 1,2,4. Then we will rule out some of the 
remaining possibilities. The eigenvalues of ({±a,±6})^ are 



^ka _|_ Q—ka _|_ ^kh _|_ A:fc 

'2Tika\ 
n 



f f 27vka\ f 27vkb\\ f f f i\\ 

cos ) T cos ) = 4 cos — fa T b) ) cos — fa — a) ) , 

\ \ n J \ n J J \n ) \n ) 



where k = 1, . . . , n. 

Assume that ({±a,±6})^ has the eigenvalue Ai = 0. This happens only when 
n is even and since 0<6— a<n/2we must have a b = n/2. By Lemma 5, 
{±a,±6} ^ Wi. From now on we suppose that C“ + C~^ + C^ + C~^ 0. Note that 

if it happens that T = 0 (respectively T = 0) this forces 6 = n/4 
(respectively a = n/4) and since gcd(a,n/4) = 1, we see that {±a,±6} ^ Ae, 
with some e = 1, 2, 4. 

From now on we suppose that Spec {S)n = Spec where T = {±c,±d} C 

2Zn^ that S rfj T and that neither S nor T is of the sets Ae, with some e = 1, 2, 4. 

We conclude that there exists k^ 1 < k < such that 



c + + V + r " - - c“ - c 



-kc 



^kd 



-kd 



0 . 



( 4 ) 



First of all we show that is the sum in (4) contains a subsum of length 2 that 
vanished that this is of the form T = 0 with t = n/4 or t = 3n/4. Indeed, 
+ C = 0 is possible for s = —t = n/2 ± n/4 only. If — Z"® = 0 for some 
t, s then {±a,±6} and {±/^c, have at least two elements in common. It is 

now easy to deduce that if this is the case, then the two sets have to coincide 
and (A:,n) = 1 in this case. Given the previous argument we have the following 
5 possibilities: 

i. The sum in (4) does not have any proper subsum that vanishes. 

Lemma 1 implies that n/gcd(n,2a,6 — a, 6 T a) has to be a factor of 210. An 
easy argument shows that gcd(n, 2a, 6 — a, 6+ a) = gcd(n, 2a, 2b) which can be 1 
or 2 since gcd(n, a, 6) = 1. Therefore n is a divisor of 420. It is proven in [18,19] 
that for values of n that are squarefree or twice a squarefree number the spectral 
Adam property holds. 

ii. The sum in (4) splits as a vanishing sum with 6 terms plus a sum 

with 2 terms and no other subsums vanish. Because sets Ae, e = 1,2,4, 
are excluded we may assume that none of a and b equals n/4. Then the sum of 
length 6 contains C^ + ^ subsum and we can proceed and in the 

case i, concluding that n|60. For these values of n the statement of the theorem 
can be verified by the exhaustive search. 

iii. The sum in (4) splits as the sum of three terms each vanishing, one 

of length 2, and two of length 3. Once again we may assume that none of 
a and b equals n/4. By a similar argument as above and applying Lemma 6, we 
can assume that one of the two sums of length three is of the form ~ 
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or with r G {zb/^c, zb/^d}. By Lemma 1, we see that n/gcd(n, a zb 6) 

divides 6 and by Lemma 5 we have that {zba,zb6} is proportional to one of the 
exceptional sets (3). 

vi. The sum in (4) splits as the sum of two terms of length four each 
vanishing and no other subsum vanishing. We can exclude all the sums 
that contain at the same time either and or and since in this case 
we would deduce that n divides 240. However if a and 6 or a and —b belongs 
to the same sum of length 4 then in a similar way by Lemma 1 we see that 
n/gcd(n, azb6) divides 6 and by Lemma 5 we have that {zba, zb6} is proportional 
to one of the exceptional sets (3). 

V. The sum in (4) splits as the sum of two terms, one of length 5 
and one of length 3, each one vanishing. If the sum of length 5 contains 
at least three elements of {zba,zb6} then we immediately deduce that n|60 and 
we have already ruled out these possibilities. If the sum of length 3 contains 
three elements of {zba,zb6}, then we come to the same conclusion. Therefore we 
assume that both the sums of length 3 and the one of length 5 contains two 
elements of {zba, zb6}. By a similar analysis as has been made in the case iii, we 
have that either n/ gcd(n, a-b 6) or n/gcd(n, b — a) divide 6 or both n/ gcd(n, 2a) 
and n/gcd(n,26) divide 6. By Lemma 5 and Lemma 6, we can conclude that 
{zba, zb6} is proportional to one of the exceptional sets (3). 

So we have proved the statement of the theorem for all sets S except possible 
the sets (3) and e = 1,2, 4. To reduce the number of exceptions let us define 
/Xct(*S) as the multiplicity of a as an eigenvalue of {S)^. So = 0 if ct is not 

an eigenvalue of {S)n. 

It is also easy to see that n/6 < < n/6 + 6, for e = zbl, zb2, zb3, zb6. 

In the following table we present //a(S') of other potentially exceptional graphs 
(together with the condition on n for which these graphs exist and are con- 
nected): 



{S)n 


Conditions on n 


Mo((5')n) 


{Wi)„ 


2 n 


f n/2 T 2, if n = 0 (mod 8); 
\ n/2, if n ^ 0 (mod 8). 


(Ai)„ 


n = 0 (mod 4) 


J 1, if n = 0 (mod 8); 
1 2, if n ^ 0 (mod 8). 


{X 2 U 


n = 4 (mod 8) 


1 


(W)„ 


n = 4 (mod 8) 


1 


(U)„ 


n = 0 (mod 3) 


f 6, if n = 24 (mod 36); 

< 2, if n = 0, 12 (mod 36); 
[ 0, if n ^ 0 (mod 12). 


{Y-iU 


n = 0 (mod 3) 


f 6, if n = 12 (mod 36); 

< 2, if n = 0,24 (mod 36); 
[ 0, if n ^ 0 (mod 12). 




n = 3 (mod 9) 


f 2, if n = 0 (mod 12); 
1 0, if n ^ 0 (mod 12). 
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For n < 18 the theorem can be proved by the exhaustive search. For n > 18 we 
see that the spectrum of {W\)n is unique, so it cannot be an exceptional set. The 
sets X 2 and A 4 when n = 4 (mod 8 ) have both /xq(A 2 ) = //o(^ 4 ) = 1 (but not 
in the case when //o(^i) = 1). However /X 2 (A 2 ) = 0 while /X 2 (A 4 ) = 2. So the 
two graphs cannot be isomorphic and their spectra are distinct from spectra of 
all other possible exceptions. So X 2 and X 4 can be eliminated from the list of 
exceptions as well. □ 

So we have 12 sets of the form {±a, ± 6 } with gcd(a, 6 , n) = 1 which do not satisfy 
the spectral Adam property, and only 5 of them are of the form {±l,±d}, the 
case of main interest. 



5 Remarks 

Naively speaking one can expect 12 x 11/2 = 66 ‘suspicious’ pairs Spec {S)n = 
Spec (^X)n for which S T. However, by comparing the multiplicities of zero 
eigenvalues we see that sets Zej e = ±1,±2,±3,±6, cannot form a suspicious 
pair with sets Ai, 0 = ±1,3. Hence there are only 4 x 3/2 ± 8 x 7/2 = 34 
suspicious pairs. Similarly, we have only 3x2/2 ±2x1/2 = 4 suspicious pairs 
of sets of the form {±l,±d}. In fact, the number of suspicious pairs can be 
essentially reduced if one computes /xo(Ae) precisely for each e = ±1,±2,±3,±6. 
Other eigenvalues may help as well. It is also easy to see that if gcd(n, 12) < 3 
then there is no exceptions at all. 

Here instead of using the isomorphism property of graphs our approach is based 
on a weaker property of their spectral identity. Thus our results are more general 
than the original Adam conjecture. 

One can probably extend our results to the case of weighted circulant graphs. 
Indeed, in the case the question can be reduced to the equation of the form ( 1 ) 
as well however, instead of ± 1 -coefficients it will have coefficient depending on 
the weights. For integer weights one can use Lemma 1 and obtain essentially 
the same results. For graphs with algebraic weights one can use more general 
results of [22,23]. This case can also be considered with attracting any new 
ideas. For equations with roots of unity with complex coefficients very general 
and strong results applicable to equations with arbitrary complex coefficients 
are available [ 10 , 21 ], however it is still now quite clear how to extract analogies 
of Theorems 3, 4 and 7. 

One more possible generalization we can be tackled by our method is studying 
circulant graphs for which spectra have large intersection. It seems that for any 
£ > 0 one can obtain some non-trivial conclusions about sets S', T C 2Zn such 
that the spectra of (S)^ and (7% have at least common elements, that is 
#Spec (S')„P|Spec > n^. 

Finally we remark, that we hope that our approach will be useful for some other 
types of graphs, including Cayley graphs. 
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Abstract. The graph 3-coloring problem arises in connection with cer- 
tain scheduling and partition problems. As is well known, this problem is 
AP-complete and therefore intractable in general unless NP = P. The 
present paper is devoted to the 3-coloring problem on a large class of 
graphs, namely, graphs containing no fully odd 7^4, where a fully odd 
7^4 is a subdivision of K 4 , such that each of the six edges of the K 4 , is 
subdivided into a path of odd length. In 1974, Toft conjectured that ev- 
ery graph containing no fully odd K 4 , can be vertex-colored with three 
colors. The purpose of this paper is to prove Toft’s conjecture. 



1 Introduction 

A graph G is said to be n-colorahle if there is an assignment of n colors, 1 , 2, . . . , n, 
to the vertices of G so that no two adjacent vertices have the same color; the 
chromatic number of G is the minimum n for which G is n-colorable. The graph- 
coloring problem is to determine the chromatic number of a given graph. As is 
well known, this A/^-hard problem has many potential real-world applications. 

Graph coloring has a long history, and from the beginning it has been closely 
tied to the famous four-color conjecture which can be stated in terms of graph 
minors. A graph 7^ is a minor of a graph G if F can be obtained from a subgraph 
of G by contracting edges; a graph G is a subdivision of a graph H ii G can be 
obtained from 77 by replacing the edges of 77 with internally disjoint paths, 
each containing at least one edge. Hadwiger [5] conjectured that every graph 
with no minor isomorphic to (the complete graph with n + 1 vertices) is 

n-colorable; Hajos [6] further conjectured that every graph containing no subdi- 
vision of is n-colorable. These two conjectures hold trivially for n = 1 and 

2; they are equivalent when n = 3 and the validity of this case was established 
by Dirac [4]. Hadwiger’s conjecture is equivalent to the four-color theorem for 
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n = 4 and 5 [14,1,9,1^]; it remains unsolved for n > 6 . Hajos’ conjecture was 
disproved by Gatlin [2] for n > 6 ; it is still open for n = 4 and 5; its validity for 
n = 4 would imply the four-color theorem. 

The problem of deciding if a graph is 3-colorable is called the graph 3- 
coloring problem^ which is related to the four-color conjecture and arises in con- 
nection with certain scheduling and partition problems. Since this problem is 
yV /^-complete and therefore intractable in general unless NP = F, it would be 
interesting to consider some special classes of graphs. 

A fully odd K 4 is a subdivision of K 4 such that each of the six edges of the 
K 4 is subdivided into a path of odd length. In 1974, Toft [13], with an attempt 
to strengthen Dirac’s theorem, conjectured that every graph containing no fully 
odd K 4 can be vertex-colored with three colors. Motivated by Toft’s conjecture, 
various authors have obtained several results in the past two decades. Let G 
be an arbitrary graph with chromatic number four, Gatlin [2] proved that G 
contains a subdivision of K 4 such that each triangle of the K 4 is subdivided 
to form an odd cycle; Krusenstjerna-Hafstrpm and Toft [ 8 ] showed that G con- 
tains a subdivision of K 4 such that each of the three edges of a A 3 in the K 4 is 
subdivided into a path of odd length; Thomassen and Toft [12] verified that G 
contains a subdivision of A 4 such that three edges of an arbitrary spanning tree 
of the K 4 can be left undivided, corresponding to paths of length one. Jensen 
and Shepherd [7] confirmed Toft’s conjecture for line graphs. The purpose of this 
paper is to prove Toft’s conjecture. 

THEOREM Every graph containing no fully odd K 4 is 3-colorahle. 

Our proof relies on decomposition methods, that is, we shall first decompose 
the graph in consideration into some nice smaller graphs, and then turn to color 
each smaller graph with three colors. By piecing together the colorings on these 
smaller graphs, we shall get a proper 3-coloring of the original graph. For ease 
of exposition, let us prove by contradiction. 

2 Proof: Outline 

Throughout, let G stand for a counterexample with the smallest number of 
vertices: 

— G has no fully odd A 4 , 

— (G is not 3-colorable, 

— every graph with no fully odd K 4 and with fewer vertices than G is 3- 

colorable. 

A mixed K 4 is a graph obtained from K 4 by inserting an odd number of new 
vertices of degree two into each of the three edges incident to a certain vertex 
and by inserting an even number of new vertices of degree two into each of the 
other edges. 
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Lemma 2.1. G contains a mixed K 4 . 

Mixed K4S are important intermediate structures, which play crucial roles 
in our proof. We shall use repeatedly the following conventions of notation and 
terminology with regard to mixed obtained from a K4 with vertex-set 

{vo,Vi,V2,V3}. 



Standard labeling of a mixed K 4 : 

— Ti denotes the path linking vq and Vi ; 

this path has an even length; its vertices are <^1, <^2, <^.3, . . . , ui; 
we set Ai = {a^ : i is odd}, A2 = {ai : i is even}, and A = Ai U A2^ 

— T2 denotes the path linking vq and V2] 

this path has an even length; its vertices are 61, 62, 63, . . . , ^2; 
we set Bi = {G : i is odd}, B2 = {bi : i is even}, and B = BiU B2^ 

— Ts denotes the path linking vq and ^3; 

this path has an even length; its vertices are ^2, C3, . . . , ^3; 

we set Cl = {ci : i is odd}, C 2 = {q : i is even}, and C = Ci U C2. 

— 7Ti denotes the path linking V2 and ^3; 

this path has an odd length; its vertices are 1^2, ^1, ^2, ^3, • • • Ws? 
we set X\ — {xi : i is odd}, X2 — {xi : i is even}, and X — X± U W2- 

— 7T2 denotes the path linking Vi and ^3; 

this path has an odd length; its vertices are yi, ^2, Z/3, • • • , ^3; 
we set Yi = {yi : i is odd}, ¥2 = {yi : i is even}, and Y = YiU ¥2. 

— 7T3 denotes the path linking v± and V2] 

this path has an odd length; its vertices are 2^1, Z2, 2^3, . . . , ^2; 
we set = {zi : i is odd}, Z2 = {zi : i is even}, and Z = Zi\J Z2. 

Observe that G enjoys many nice properties in the presence of a mixed iC4, 
some of which are given below, where a path is called X -external if all the internal 
vertices of this path are outside X , 

Lemma 2.2. Let X he an arbitrary mixed K 4 with a standard labeling. Then 
the following statements hold: 

(i) Every X -external path linking any two of Ai^B\, and Ci is of odd length; 
(a) Every X -external path linking Ai and B 2 U C2 is of even length; 

(Hi) Every X -external path linking A\ and Yi U Zi is of even length; 

(iv) Every X -external path linking A\ and tv i is of even length; 

(v) No component of G — X is adjacent to all three of Ai, B\, and C U Xi U 

Yi^ivs}; 

(vi) If there is a X -external path linking any two of Ai^B\, and Ci, then 
there is no X -external path linking any other two of Ai^Bi, and Ci. 

To outline the proof, we still need one definition. 

Definition 2.1. A path Ti in a mixed K 4 X (with a standard labeling) is said 
to meet the parity requirement with respect to X if Ti satisfies the following two 
conditions: 
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(a) For any two vertices u and v on n such that Ti[u,v] is of even length, u and 
V are nonadjacent in G; 

(b) For any component H of G — S which is adjacent to no vertex in S — Ti, 
every H -internal path linking any two vertices u and v on Ti has the same 
parity as Ti[u,v]. 

Lemma 2.3. In every mixed K 4 XI with a standard labeling, at least one of 
Ti, T 2 , and Ts meets the parity requirement with respect to X. 

In completing our proof of the theorem, symmetry allows us to distinguish 
among three cases: 

Case 1. There is a mixed X with a standard labeling such that A\ 
and B\ are linked by some i7-external path and such that ra meets the parity 
requirement with respect to X , 

Case 2. For each mixed X 4 X with a standard labeling such that A\ and 
B\ are linked by some T'-external path, ra does not meet the parity requirement 
with respect to X , 

Case 3. For each mixed X4 X with a standard labeling, there is no T'-external 
path linking any two of A\, B±, and G±. 

These three cases are the subjects of the next three sections of our paper. Using 
decomposition methods, we shall reach a contradiction in each case. Let us sketch 
the proof of the first case now. 



3 Proof: Case 1 

The assumption of this section: there is a mixed K 4 X with a standard labeling 
such that Ai and B\ are linked by some X -external path and such that ra meets 
the parity requirement with respect to X. 

Lemma 3.1. There is no X -external path of odd length linking G\ and X — 
[G2 U {uo,ua}). 



Since ra meets the parity requirement with respect to X, by definition both 
G\ and G 2 U {ro,ra} are stable sets. 

Definition 3.1. Let C* denote the graph obtained from G by shrinking G\ and 
shrinking G 2 U {ro,ua}. 

We are going to prove that C* is 3-colorable, and so (as 3-colorability of C* 
implies 3-colorability of G) obtain the desired contradiction. 

Definition 3.2. Let X* denote the graph obtained from X by shrinking G\ and 
shrinking G 2 U {ro,ua}. 
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Lemma 3.2. Let Q be a U -external path in G sueh that 

• the length of Q is odd, 

• one end, C 2 k-i, of Q is in C\, 

• the other end, w, of Q is outside L\ 

Then w is adjacent to at most two vertices of other than Ci. Furthermore, if 
w is adjacent to precisely two vertices of other than Ci then these two vertices 
may he labeled u and v so that one of the following conditions is satisfied: 

(i) u is in A 2 , V is in B 2 ; 

u is adjacent to no vertex in B 2 and v is adjacent to no vertex in A 2 ; 

(a) u is in Xi, V is in Yi; 

u is adjacent to no vertex in Yi and v is adjacent to no vertex in X\. 

Let us construct an equivalence relation on the set of vertices of by the 
following algorithm: 

Algorithm 3.1 

Let each vertex of form an equivalence class. 
while there is a path Q in (C* such that 

• the length of Q is odd, 

• one end of Q is Ci, 

• all the other vertices are outside , 

• the end of Q that is outside is adjacent to 
precisely two equivalence classes, U and V , 
other than Ci 

do merge U and V . 

Lemma 3.3. Each equivalence class produced by Algorithm 3.1 is a stable set 
in G. 

Definition 3.3. Let Gq denote the graph obtained from G^ by shrinking each 
equivalence class produced by Algorithm 3.1. 

Our task is to prove that O'* is 3- color able; we shall actually prove that G^ 
is 3-colorable; by Lemma 3.3, 3-colorability of G^ implies 3-colorability of O'*. 

Definition 3.4. Let L\,F 2 , - - - ,Fk denote all the components of O'* — X"^ that 
are adjacent to G\. 

Note that both Ci and G 2 U {^ 0 ,^ 3 } are vertices of G^. From Lemma 3.1, it 
follows that 



G 2 U ^ 3 } is the only neighbor of Ci in 

cs-(UuC2U...ucc. 



( 1 ) 



Lemma 3.4. G^ — [L\ U F 2 U . . . U contains no fully odd K 4 . 
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By Lemma 3.4 and the assumption on — [h\ U L2 U . . . U admits a 

3 -coloring cp; switching colors if necessary, we may assume that 

(f{C 2 u {t'o,t’ 3 }) = 1 and ^p{Ci) = 2. (2) 

We are going to extend ip into a 3 -coloring of G* by coloring each Fi with 
1 < i < k separately 

Lemma 3 . 5 . For each i with 1 < i < k, there are disjoint stable subsets D2 
ofV{Fi) and a partition ofV{Fi) into sets W (x e DiU D2) with the following 
properties: 

(i) Vx contains x, is the union of some blocks of Fi 
if it has at least two vertices, and W is connected; 

(a) Vx and Vy are disjoint and 

Vx — {x} and Vy — {y} are nonadjacent whenever x ^ y; 

(Hi) Di U D2 induces a connected bipartite subgraph, 
which is the union of some blocks of Fi; 

(iv) V {Fi) — D2 is nonadjacent to F — {G2 U 

(v) For each x in D2, all the neighbors of x in F — Gi have the same color in 

Now we are ready to construct a 3 -coloring of Fi which is compatible 
with (p for each 1 < i < k. We shall actually construct a 3 -coloring px of each 
subgraph of Fi induced by W so that the union of all cp^’s yields pi. 

Description: Let C2k-i be a vertex in Gi which is adjacent to some vertex 

u? in D2- 

To construct px when x E Di, consider the graph Gx obtained from the 
subgraph of G induced by W U by removing all the vertices in G± — {c2A;-i} 
and then identifying all the vertices in each of the four sets {uq} U {c2^ : i < k}, 
{^3} bJ {c2i : i > k}, {c2A;-i,^}. It can be shown that Gx contains 

no fully odd K4. Since Gx has fewer vertices than G, Gx has a 3 -coloring 0 ^; 
trivially (j>x{x), 4>x{G2 U {1^07^3}) distinct colors; without loss of generality, 
(>x{x) = 2 , 4 >x{G 2 U {t^oWs}) = 1 * The restriction of (jx on Vx is px^ 

To construct px when x G 1^2, we distinguish between two cases. 

Case 1 . 1 . X has no neighbor in F — with color 1 or 3 . 

Consider the graph Gx obtained from the subgraph of G induced by W UC2 U 
{^0, ^3, 02k-i} by identifying all the vertices in each of the three sets {i;o} U {c2^ : 
i < k), {i;3} U {c2i : i F k}, and then adding the edge xc2k-i’ 

It can be shown that Gx contains no fully odd X4. Since Gx has fewer vertices 
than G, Gx has a 3 -coloring (fx] trivially 4>x{o2k-i) and 4>x{G2 U {1^07^3}) are 
distinct; without loss of generality, 4 >x{G 2 ^{vqF 2 >}) = 1 and 4>x{o2k-i) = 2 . The 
restriction of (jx on Vx is px^ 

Case 1 . 2 . X has some neighbor u in F — with color 1 or 3 . 

In case p{u) = 1 (resp. 3 ), consider the graph Gx obtained from the subgraph 
of G induced by W C C2 U {^0,^3} by identifying all the vertices in G2 U {^0,^3} 
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and then adding the edge vqx (resp. identifying vq and x). It can be shown that 
Gx contains no fully odd K 4 . Since Gx has fewer vertices than G^ Gx has a 
3-coloring without loss of generality, 4 >x{G 2 U 4^x{^) = 3 (resp. 

= !)• The restriction of (j)x on Vx is (fx- 

Let us justify the validity of (fx for each x G Di U D 2 - By Lemma 3.5(iv), 
Vx — is nonadjacent to — (C 2 U {uo,U 3 }). For each x E Di^ Lemma 3.5(iv) 
also implies that x is nonadjacent to U — {G 2 U {xo^r^s}). Note that ^x{^) = 2, 
from the construction it can be seen that is compatible with (/?. For each 
X G D 2 , by Lemma 3.5(v) all the neighbors of x in N — Ci has the same color 
in ip. Note that ipx{^) = 1 or 3, from the construction it can be seen that is 
compatible with ip. 

Since ipx{^) 7 ^ ^y{v) whenever x E D\ and y E D 2 , by Lemma 3.5(i)-(iii), 
the union of all cp^’s yields a proper 3-coloring, of Fi which is compatible 
with (/?. □ 



4 Remarks 

The idea underlying the proof of case 1 can be used to handle both case 2 and 
case 3, although the latter two cases are technically more complicated. 

A slight modification of our proof yields a polynomial time algorithm for 
coloring graphs containing no fully odd K 4 with three colors provided there is a 
polynomial time algorithm for recognizing graphs with no fully odd K 4 , Sewell 
and Trotter [11] affirmatively settled a question of Chvatal [3] and proved that 
every connected, stability critical graph that is neither K 2 nor an odd cycle 
contains a fully odd K 4 , This leads to a polynomial time algorithm for finding 
maximum stable sets in graphs with no fully odd K 4 , Since both the 3-coloring 
problem and the maximum stable set problem are hard in general, it would be 
nice and interesting to have a polynomial time algorithm for recognizing graphs 
containing no fully odd K 4 , We close with a conjecture that such an algorithm 
exists. 



Acknowledgments 

This work was carried out while I was a graduate student at Rutgers Center for 
Operations Research, Rutgers University; I am very much indebted to Professor 
Vasek Chvatal for his invaluable guidance and for his generous help in writing 
this paper. I am also grateful to Professor Paul Seymour for his stimulating 
suggestions. 



268 



Wenan Zang 



References 

1. K. Appel and W. Haken, Every Planar Map Is Four Colorable, A.M.S. Contemp. 
Math. 98 (1989). 262 

2. P. Gatlin, Hajos Graph-Coloring Conjecture: Variations and Counterexamples, J. 
Comhin. Theory Ser. B 26 (1979), 268-274. 262 

3. V. Chvatal, On Certain Polytopes Associated with Graphs, J. Combin. Theory 
Ser. B 18 (1975), 138-154. 267 

4. G. A. Dirac, A Property of 4-Chromatic Graphs and Some Remarks on Critical 
Graphs, J. London Math. Soc. 27 (1952), 85-92. 261 

5. H. Hadwiger, Tiber eine Klassifikation der Streckenkomplexe, Vierteljahrsch. 
Naturforsch. Ges. Zurich 88 (1943), 133-142. 261 

6. G. Hajos, Tiber eine Konstruktion nicht n-farbbarer Graphen, Wiss. Z. Martin- 
Luther-Univ. Halle- Wittenberg Math.-Natur. Reihe 10 (1961), 116-117. 261 

7. T. R. Jensen and F. B. Shepherd, Note on a Conjecture of Toft, Combinatorica 15 
(1995), 373-377. 262 

8. Ti. Krusenstjerna-Hafstrpm and B. Toft, Special Subdivisions of K 4 and 4- 
Chromatic Graphs, Monatsh. Math. 89 (1980), 101-110. 262 

9. N. Robertson, D. Sanders, P. Seymour and R. Thomas, The Four-Colour Theorem, 
J. Combin. Theory Ser. B 70 (1997), 2-44. 262 

10. N. Robertson, P. Seymour and R. Thomas, Hadwiger T Conjecture for KQ-h\ee 
Graphs, Combinatorica 13 (1993), 279-361. 262 

11. E. C. Sewell and L. E. Trotter, Stability Critical Graphs and Even Subdivisions of 
K 4 , J. Combin. Theory Ser. B 59 (1993), 74-84. 267 

12. C. Thomassen and B. Toft, Non-separating Induced Cycles in Graphs, J. Combin. 
Theory Ser. B 31 (1981), 199-224. 262 

13. B. Toft, “Problem 10”, in: Recent Advances in Graph Theory, Proc. of the Sym- 
posium held in Prague, June 197 f (M. Fiedler, ed.). Academia Praha, 1975, pp. 
543-544. 262 

14. K. Wagner, Tiber eine Eigenschaft der ebene Komplexe, Math. Ann. 114 (1937), 
570-590. 262 



A New Family of Optimal 1-Hamiltonian Graphs 
with Small Diameter * 

Jeng-Jung Wang^, Ting-Yi Sung^, Lih-Hsing Hsu^, and Men- Yang Lin^ 

^ Department of Computer and Information Science, National Chiao Tung 
University, Hsinchu, Taiwan, R.O.C. 

^ Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C. 

^ Department of Information Management, National Taichung Institute of 
Commerce, Taichung, Taiwan, R.O.C. 



Abstract. In this paper, we construct a family of graphs denoted by 
Eye{s) that are 3-regualr, 3-connected, planar, hamiltonian, edge hamil- 
tonian, and also optimal 1-hamiltonian. Furthermore, the diameter of 
Eye{s) is 0(log n), where n is the number of vertices in the graph and 
to be precise, n = 6(2^ — 1) vertices. 

1 Introduction 

Given a graph G = {V,E)^ V (G) = V and ^'(G) = E denote the vertex set and 
the edge set of G, respectively. All graphs considered in this paper are undirected 
graphs. A simple path (or path for short) is a sequence of adjacent edges (t^i, 1^2), 
(^2,^3), •• -,(^771-2, written as ^3, • • • , ^m), in which 

all of the vertices t^2, ^3, • • • , are distinct except possibly vi = Vm- The 

path is also called a cycle if vi = Vm and m > 3 . A cycle 

that traverses every vertex in the graph exactly once is called a hamiltonian 
cycle. A graph that contains a hamiltonian cycle is called a hamiltonian graph 
or said to be hamiltonian. A graph is edge hamiltonian if each edge in the graph 
is incident with some hamiltonian cycle in the graph. The diameter of graph G 
is the maximum distance among all pairs of vertices in G, where distance means 
the length of a shortest path joining vertices u^v. 

For cV and E^ C E, G — V^ — E^ denotes the graph obtained by removing 
all of the vertices in from V and removing the edges incident with at least one 
vertex in and also all of the edges in E^ from E. Let k he a. positive integer. 
A graph G is k-edge hamiltonian if G — E^ is hamiltonian for any E^ C E with 
\E^\ = k; G* is said to be optimal k-edge hamiltonian if G* contains the least 
number of edges among all k-edge hamiltonian graphs having the same number 
of vertices as G*. A graph G is k- vertex hamiltonian if G — UMs hamiltonian for 
any C V with \ V^\ = k; G* is said to be optimal k-vertex hamiltonian if G* 
contains the least number of edges among all /^-vertex hamiltonian graphs having 
the same number of vertices as G*. A graph G is k-hamiltonian if G — — E^ 
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is hamiltonian for any set (Z V and C E with 1 | + | | < k. It is clear 

that every /;;-hamiltonian graph has at least k^3 vertices. Moreover, the degree 
of every vertex in a /^-hamiltonian graph is at least k-\-2. A /^-hamiltonian graph 
having n vertices is said to be optimal if it contains the least number of edges 
among all /^-hamiltonian graphs having n vertices. 

Mukhopadhyaya and Sinha [6] proposed a family of optimal 1-hamiltonian 
graphs which are also planar. These graphs are hamiltonian and have diameter 
of + 2 if n is even, and + 3 if n is odd. For any positive integer k^ 
Harary and Hayes presented families of optimal k-edge hamiltonian graphs and 
optimal /^-vertex hamiltonian graphs in [3] and [4], respectively. In particular, the 
family of optimal 1-edge hamiltonian graphs proposed in [3] are identical to the 
family of optimal 1-edge hamiltonian graphs proposed in [4]. Hence this family of 
graphs are 1-hamiltonian. Furthermore, each graph is planar, hamiltonian, and 
of diameter + ((§ + 1) mod 2) if n is even, + (([§ J + 1) niod 2) 

if n is odd, where n is the number of vertices in the graph. Wang, Hung, and 
Hsu [9] presented another family of optimal 1-hamiltonian graphs, each of which 
is planar, hamiltonian, 3- regular, and of diameter O(y^) with n vertices in the 
graph. We are interested in finding more families of optimal 1-hamiltonian graphs 
that are also hamiltonian. 

The three families of optimal 1-hamiltonian graphs presented in [4], [6] and 
[9] are planar, 3-regular and hamiltonian. It is natural to ask whether we can 
find such graphs with smaller diameter. This problem relates to the famous 
(n, d, D) problem in which we want to construct a graph of n vertices with 
maximum degree d such that the diameter D is minimized. When d and n 
are given, the lower bound on diameter called the Moore bound (on diam- 
eter), is given hy D > log^_in — | [2]. In this paper, we propose a family of 
optimal 1-hamiltonian graphs that are hamiltonian, edge hamiltonian, planar, 
3-regular and 3-connected. Furthermore, the diameter of our graphs is no more 
than 41og2 i.e., less than 4 times of Moore bound. 

2 Definitions 



Let No = 3 and Nk = 9*2^“^ for any positive integer k. Let [i]m denote i mod m. 
An eye network Eye{s)^ s > 1, is a graph with s + 1 layers of concentric cycles. 
These s-\-l cycles are denoted by io, ^ 2 , • • • , and O^. In particular. Os is 
the outermost cycle. The vertex set V {Eye{s)) is given by IJfc=o ^ 

where 

V{Ik) = {{k,j) \ 0 < j < Nk — 1} for 0 < k < s — 1 and 
|0 < j < iVs - 1 and [j]s = 0}. 

For vertex (k,j), k and j are referred to as the first and the second coordinate, 
respectively. Throughout this paper all computation on the second coordinate 
of a vertex at the kth concentric cycle are carried out with modulo Nk. Graph 
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Eye{s) contains two types of edges, i.e., cycle edges^ denoted by and inter- 
cycle edges^ denoted by , which are given as follows: 

^ + 1)) for 0 < /^ < 5 - 1 and 0 < j < Aa; - 1, 

~ 1 {kj + 3)) for A; = s,0 < j < TVs - 1 and [j]s = 0, 

{ ((0, j), (1, 2j + [jjs)) for fc = 0 and j = 0, 1, 2, {v}spacelmm 
i{kj), lk + l,2j + [j] 3 )) for 1 < A; < s - 1, 0 < j < TVfe - 1 

and [j]s ^ 0. 

The set {cj^j |0 < j < Aa;— 1} is denoted by if 0 < A: < s— 1, and denoted by 

E{Os) if k = s. We use to denote the set < j < Nk~ 1}. The edge 

set of Eye{s) is defined as E{Eye{s)) = |J^“q E{1j.)\J E{Os) U (Ufe=o We 

illustrate Eye(s) for 1 < i < 4 in Figure 1. In all figures, the first coordinate of 
each vertex is omitted. Graph Eye{s) is 3-regular and contains 3 + 2^“^+ 

3*2^“^ = 6(2^ — 1) vertices and 9(2^ — 1) edges. 




Fig. 1. Examples of eye networks 



On the other hand, graphs Eye{s -h 1), s > 1, can be recursively drawn (or 
constructed) from Eye{s) by performing the following two steps: 

Step_SUBDIVISION. Subdivide each edge of 0^,0 < j < E q — I and [j]s = 0, 
into a path of length 3, i.e., replace with a path ((s, j), (s, j + 1), (s, j + 

2), +3)) to connect its two ends (s, j) and (s, j + 3). Rename Os as Ig. 
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Step.EXTENSION. Construct graph as a concentric cycle outside Ig and 

join every vertex (s, j) in Ig with vertex (s + 1 , 2 j + [j]s) in O 5+1 for I < j < 
Ng — 1 and [j]s = 1 and 2 . 



The above recursive construction also shows that Eye{s) is a planar graph. 
All vertices in each cycle can be drawn at equal distance, and all intercycle edges 
drawn in the normal directions of the corresponding cycles as shown in Figure 
1. Therefore, Eye(s) is invariant under the rotation of 120"^. Furthermore, we 
can use a specified subgraph to obtain other isomorphic subgraphs by proper 
rotation, which we mean rotation of 120° or 240°. To be specific, each vertex 
{k,i) is relabelled by {k,i + SNk/3)^ where ^ = 1 for 120° rotation and S = 2 
for 240° rotation. For example, consider a cycle in Eye(2) H 2 = ((0,0), (1,0), 
(1,1), (2,3), (2,0), (1,8), (1,7), (2,15), (2,12), (2,9), (2,6), (1,2), (1,3), (1,4), 

(1,5), (1,6), (0,2), (0,1), (0,0)). Then we can obtain another cycle C 2 by 120° 

rotation of H 2 , which is given by C 2 = ((0,1), (1,3), (1,4), (2,9), (2,6), (1,2), 

(1,1), (2,3), (2,0), (2,15), (2,12), (1,5), (1,6), (1,7), (1,8), (1,0), (0,0), (0,2), 

( 0 , 1 ))- 

Note that for any vertex (k^i) in Eye(s) with A: > 1 there exists a vertex 
{k — 1, j) such that the distance between {k,i) and {k — l,j) is at most 2. In 
other words, each vertex in Ik can reach Ik-i with at most two edges. It follows 
that the diameter of Eye(s) is at most 2 ( 2 (s — 1 ) + 1 ) + 1 = 4s — 1 . Because the 
number of vertices in Eye(s) is 6(2^ — 1), the diameter of Eye(s) is less than 4 
times of Moore bound. 



3 Hamiltonicity of Eye{s) 

In this section, we prove that Eye{s) is a hamiltonian graph. 

Theorem 1. Eye{s) is hamiltonian for every s >1. 

Proof: We prove this theorem by induction. For s = 1, Hi = ((0,0), (1,0), 
(1,3), (1,6), (0,2), (0,1), (0,0)) is a hamiltonian cycle in Eye{l). 

Assume that Eye(k) is hamiltonian for all 1 < A: < s and s > 1 . Let Hg 
be a hamiltonian cycle in Eye{s). Note that the degree of any vertex in Eye{s) 
is three, at least one edge of Og is in Hg and at least one edge of Og is not 
in Hg. Since Eye{s -h 1) can be obtained from Eye{s) by Step_SUBDIVISION 
and Step.EXTENSION, we can construct a cycle H' g in Eye{s + 1 ) from Hg by 
subdividing the edges in E{Og)C\E{T-Lg). Let Ug denote the collection of vertices 
in V (Jg) which are not in H^g. It follows that H^g is a hamiltonian cycle in the 
graph Eye(s + 1) - F (O^+i) - Eg. 

Observe that vertex (s + 1 , j) in is adjacent to (s, |~|] — 1 ) in A. In order 

to augment to include all of the vertices in Eg, we replace edge es+i^ 2 m+i in 
Og-^i with a path ((s + 1, 2m + 1), (s, m), (s, m + 1), (s-h 1, 2m + 4)) if (s, m) e Eg 
and [m]s = 1. In this way, we attain a cycle O^g^i that traverses each vertex of 
EgUV (Os+i) exactly once. 
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Cycles 1~L^ s and s+i are two disjoint cycles that traverse every vertex of 
Eye(s -h 1) exactly once. To combine these two cycles into a hamiltonian cycle, 
we arbitrarily choose an edge Csj in H^s with [j]s = 1 and then define a cycle 
Hs+i given by 

{H's - esj) U (O's+i - es+i,2j+i) U {e*y (1) 

Thus the theorem follows. □ 

Henceforth, we write “the (construction) scheme (1)” to mean the construc- 
tion scheme presented in the proof of Theorem 1 . We use H's and O' s+i to 
denote the two cycles as specified in the scheme ( 1 ). Given H's and an edge 
of H's with [j]s = 1 , the hamiltonian cycle Hs-\-i is uniquely defined. Since 
Eye{s + 1) is a 3-regular graph for s > 1, no two consecutive edges in can 

be excluded from Hs+i- The following properties hold when deleting edges Cg^j 
in H' g and egj^i^ 2 j+i ia with [j]^ = 1 to construct Hg-^i: 

(PI) Edges es+i^ 2 j -2 and eg^i^ 2 j -\-4 are included in Hg-\-i. 

(P2) Edge Cgy e Hs with i 7 ^ j — 1 if and only if {{s + l,2i), {s -h l,2i -h 3), 
(s + 1, 2i -h 6 ), (s + 1, 2i -h 9)) is included as a subpath of 7Ys+i. 

(P3) Any edge ^ Hs implies ^ Hs+i, G Hs+i, Cg^i^2 ^ Hs+i and 

65 + 1 , 2z+3 ^ Hs+1- 

(P4) Each edge in — Hs-\-i is given by with [m]e = 3, and on the 

other hand, Hs-\-i includes all of the edges with [1]q = 0 . 

(PI) and (P2) can be easily verified. (P3) follows from the definition of O' gj^\. 
The chosen edge in O' gj^i to be deleted to form Hs+i has the form egj^i^ 2 j+i 
with [j]s = 1 . Therefore, (P4) follows from (PI) and (P3). 

Eor s > 2, a hamiltonian cycle Hs constructed by scheme (1) is called the 
fundamental hamiltonian cycle of Eye{s) if 

(i) Hi = ((0,0), (1,0), (1,3), (1,6), (0,2), (0,1), (0,0)), and 

(ii) edge in H' i is chosen to be deleted in the recursive construction of 
for all 1 < i < s — 1 . 

In particular, we use THs to denote the fundamental hamiltonian cycle in 
Eye[^s\ An example of the fundamental cycle of Eye{4:) is illustrated in Eig- 
ure 2 . Since Oi — Hi consists of only one edge and we delete one edge from 
O'i^i in the recursive construction of 7^i+i, EHs^^Og is a set of s disjoint paths 
which are denoted by Tq, /^i, • • • , Eg-i. To be precise, 

Fo = ((s,0),(s,3)), 

Pi = ((s, 3-2*), (s, 3-2* + 3), (s, 3-2* + 6), • • • , (s, 3-2*+i - 3)) 
for 1 < i < s — 2 , 

n-i = ((s,3-2*-i), (s,3-2*-i + 3), (s,3-2*-i + 6), • • • , (s, N, - 3)). 

Path Ei has 2^ — 1 edges for 1 < i < s — 2, and 2^ — 1 edges for i = s — 1. 
Eurthermore, there is a set of 2^ — 2 intercycle edges, given by Tg = {e^_^ s 
'^a- 1,3 2 «- 2 + 2 ’ ®Gi ,3 2 «- 2 + 4 > • • - 4)5 which are not included in FHs- 
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Fig. 2. The fundamental cycle of Eye{4:) 



Remark 1: Since ^ < \e{oI)\ < | for s > 2 where |X| denotes the number 
of edges in X, we can always obtain a hamiltonian cycle in Eye(s) containing 
any specific edge e G E{Os) by proper rotation of EHs- Moreover, since ^ < 

I < |, we can always obtain a hamiltonian cycle in Eye{s) — e by rotating 
Tl~is properly for any edge e G E^_^ and s>2. 

4 1-Edge Hamiltonicity of Eye{s) 

The hamiltonian cycle Hi = ((0,0), (1,0), (1,3), (1,6), (0,2), (0,1), (0,0)) of 
Eye{l) does not include edges ei^e, eg, 2 and By proper rotation of ?7i, 
we can obtain other two distinct hamiltonian cycles H\ = ((0,1), (1,3), (1,6), 
(1,0), (0,0), (0,2), (0,1)) and Hj = ((0,2), (1,6), (1,0), (1,3), (0,1), (0,0), 
(0,2)) such that edges ei^o, ^0,0 and ej 2 are not in H\ and edges 61^3, eo,i and 
Cq q not in H^. Thus, Eye{l) is a 1-edge hamiltonian graph. For Eye(2)^ it can be 
verified by exhaustive construction that Eye{2) —62,0 is not hamiltonian. Hence, 
Eye{2) is not 1-edge hamiltonian. Furthermore, Eye{2) — e is not hamiltonian 
if e G {62,0X2,6X2,12}, and is hamiltonian otherwise. 

Let E] = {Cs^rn I Me = 0, 0 < m < N^} and = {ek,m \ Me = 0 or 2,0 < 
rn < N{k)^ 2 < k < s — 1} he two subsets of E{Eye{s)). By definition, we have 
El = El = 0 . 

Lemma 2. Eye{s) — e is hamiltonian for all e G E{Eye{s)) — {El U X^). 

The following two lemmas are provided with proofs omitted. 
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Lemma 3. Eye{s) — e is hamiltonian for e G El and s >3. 

Lemma 4. Eye{s) — e is hamiltonian for e G E^ and s >3. 

We illustrate the special cases of Eye(3) in Lemma 2 and Lemma 3 in Figures 
3(a)-3(d). Lemmas 1, 2, 3 can immediately lead to the following theorem. 




Fig. 3. The hamiltonian cycle of Eye(3) does not traverse the edges where 
m = 0,6, 12, 18 in (a) and m = 24, 30, 0, 6 in (b). The hamiltonian cycle of Eye{3) 
does not contain the edges 62 m where rn = 0, 2, 6, 8 in (c) and m = 0, 2, 12, 14 

in (d). 



Theorem 5. Eye{s) is 1-edge hamiltonian for every s > 1 and s ^ 2. 

Because the degree of every vertex in Eye{s) is three, we have the following 
corollary: 
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Corollary 6. Eye{s) is optimal 1-edge hamiltonian for s ^2. 

To verify the relationship between 1-edge hamiltonicity and edge hamiltonic- 
ity, we have the following theorem. 

Theorem 7. Any 3-regular 1-edge hamiltonian graph is edge hamiltonian. 

Proof: Let C be a 3-regular 1-edge hamiltonian graph. Let e be an edge in 

G and be one of the four edges that are incident with e. Since G is 1-edge 
hamiltonian and 3-regular, there exists a hamiltonian cycle C in G — e^ such that 
e is in C. Thus G is edge hamiltonian. □ 

Theorem 8. Eye{s) is edge hamiltonian for every s. 

Proof: It can be verified that Eye{2) is edge hamiltonian. Furthermore, it follows 
from Theorem 3 that Eye(s) is edge hamiltonian for every s. □ 

5 1- Vertex Hamiltonicity of Eye{s) 

In this section, we first show that Eye(s) — x, s > 2, is hamiltonian where 
X G V (Eye{s)) — V (Eye{s — 1)). The set V (Eye{s)) — V (Eye{s — 1)) for s > 2 
can be partitioned into the following four subsets: 

Vl = {(s, m) I [m]i 2 = 3 or 9, 0 < m < TVs — 1}, 

Vg = {(s, m) I [m]i 2 = 0 or 6, 0 < m < Vg — 1}, 

Vg = {(5 — 1, m) I [m]e = 1 or 4, 0 < m < Ng-i — 1}, and 
Vg = {(5 — 1, m) I [mje = 2 or 5, 0 < m < Ng-i — 1}. 

To prove that Eye(s) — x is hamiltonian for s > 2 and x E V (Eye{s)) — 
V (Eye{s — 1 )), we first construct a hamiltonian cycle Els-i in Eye(s — 1 ) and 
then augment it with Os without traversing vertex x. Since Eye{2) — e is not 
hamiltonian if e G { 62 , 07 ^ 2 , 6 ^ 2 , 12 }: h will yield a special case of x in each of 
V{, Vg , Vg , and Vg for the construction of ELs-i as stated in the proofs of the 
following lemmas. 

Lemma 9. Eye{s) — x is hamiltonian for x G and s >2. 

Proof: The special case is given by s = 3 and x G {(3,3), (3,15), (3,27)}. We 
can construct a hamiltonian cycle in Eye{3) — x (the construction is omitted 
here). 

For the cases of s = 2 and x = (s, m), we define Hs-i as 77} for m = 3, as Hi 
for m = 9, and as Hi for m = 15, where 77i, 77}, and 77} are defined in the first 
paragraph of Section 4. For the remaining cases of s and x = (s,m), let 77s-i 
denote a hamiltonian cycle of Eye{s — 1) — e 5 _iym ]_2 obtained by applying the 
construction scheme (1) if e 5 _iym ^_2 ^ ^l-i^ and by applying the construction 
presented in the proof of Lemma 2 if e 5 _iym ^_2 G El_i. Since deleting vertex 
X from Eye{s) reduces the degree of each vertex of (s,m — 3), (s,m -h 3) and 
(s — 1, — 1) by one, it follows that any hamiltonian cycle in Eye{s) — x must 
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traverse edges , e^^^+s, and 

Let V denote the path {O^s ~ es,m -3 - ]-i) ^ ]- 2 , _ 3 |. 

Then V contains all vertices in (V {Eye{s)) — V{Ws-i) — x) U {(s — 1, [y] — 
2), (s — 1, — 3)}. Furthermore, V and H' s-i ~ e 5 _ijm ]_3 are two disjoint 

paths with common end vertices (s — 1 ,|~y] — 2), (s — — 3) and contain 

all vertices in Eye{s) — x. Thus, V U (7Y^s-i — eg_i^^rn^_^) defines a hamiltonian 
cycle of Eye(s) — x. □ 

The following three lemmas are provided with proofs ommitted. 

Lemma 10. Eye{s) — x is hamiltonian for x e and s >2. 

Lemma 11. Eye{s) — x is hamiltonian for x e and s >2. 

Lemma 12. Eye{s) — x is hamiltonian for x e Vj and s >2. 

Theorem 13. Eye{s) is 1-vertex hamiltonian for every s > 1. 

Proof: For Eye{l), the cycle ((0,0), (0,1), (1,3), (1,6), (0,2), (0,0)) does not 
include vertex (1,0). Since this cycle is unique of length five up to isomorphism, 
it follows that Eye(l) is 1- vertex hamiltonian. Now consider s > 2. Let x = 
{k^ rn) G Eye(s), where 0 < k < s. If k = s, it follows from Lemmas 4, 5, 6, and 
7 that Eye(s) — x is hamiltonian. If A: < s — 1, then we distinguish the following 
four cases of x. 

case 1: Let k = 0. Since Eye(l) is 1-vertex hamiltonian, we can obtain a hamil- 
tonian cycle, denoted by AfH\^ in Eye{l) — x. 
case 2: Let A: = 1 and [ra]s = 0. Similarly, we can obtain a hamiltonian cycle, 
denoted by A/’7^2, in ~ 

case 3: Let k > 2 and [ra]s = 0. Since x G U V|, it follows from Lemma 4 
and Lemma 5 that we can obtain a hamiltonian cycle, denoted by NELk^ in 
Eye(k) — x. 

case 4: Let A: > 1 and [rn]s ^ 0. Since x G Vk-\-i i^ follows from Lemma 6 

and Lemma 7 that we also obtain a hamiltonian cycle, denoted by A/’TY^+i, 
in Eye(k + 1) — x. 

In case 3, J\fHk is a cycle that traverses all vertices in Eye{k) except x exactly 
once. Let J\fH^k denote the cycle obtained from J\fHk by subdividing all edges 
in {Ok — x)C\AfELk- By performing Step_SUBDIVISION and Step_EXTENSION on 
Eye{k)^ J\fEL^ k will not include vertices (A:, m — 2), (A:, m — 1), {k,m)^ (A:, m + 1) 
and (A:, m+2). To construct a cycle, denoted by A/’7 Y|+i, in Eye{k + 1) to traverse 
all vertices excluding x exactly once, we use the scheme (1) and replace the path 
((A: + 1, 2m — 3), (A: + 1,2m), (A: + 1,2m + 3), (A: + 1,2m + 6)) in with path 

((A: + 1, 2m — 3), (A:,m — 2), (A:,m — 1), (A: + l,2m), (A: + l,2m + 3), (A:,m + 1), 
{k^rn -h 2), {k + 1,2m + 6)). In case 2, we can obtain a hamiltonian cycle in 
Eye{2) — x, denoted by NEL^^ using the same approach as in case 3. 

We then apply the scheme (1) on MEL\, MEL\, MEL\j^i or A/’TY^+i, respec- 
tively, to construct a hamiltonian cycle in Eye(s) — x. □ 

Since Eye{s) is 1-vertex hamiltonian and 3-regular, the following corollaries 
hold. 
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Corollary 14. Eye{s) is 3-connected for every s. 

Corollary 15. Eye{s) is optimal 1-vertex hamiltonian for every s. 

Corollary 1 and Corollary 3 immediately lead to the following theorem. 
Theorem 16. Eye{s) is optimal 1-hamiltonian for s ^ 2. 

6 Concluding Remarks 

In summary, Eye(s) is optimal 1-edge hamiltonian for s 7 ^ 2, optimal 1- vert ex 
hamiltonian for every s, and optimal 1-hamiltonian for s ^ 2. Furthermore, 
Eye(s) is edge hamiltonian for every s. Eye(s) indeed has nice properties on 
hamiltonicity and is planar. This graph has diameter of O(logn) and to be 
precise, bounded by 4 log — 1, where n is the number of vertices in Eye{s). 
It would be interesting to construct a family of planar optimal 1-hamiltonian 
graphs which have nice properties as Eye(s) and have even smaller diameter 
than Eye(s). 



REFERENCES 

[1] J.A. Bondy and U.S.R. Murty, Graph Theory with Applieations^ North- 
Holland, New York, (1980). 

[2] F.R.K. Chung, Diameters of graphs: Old problems and new results, Proe. 
18th South-Eastern Conf Comhinatories, Graph Theory, and Gomputing, 
Gongressus Numerantium, 60(1987), 298-319. 

[3] F. Harary and J.P. Hayes, Edge fault tolerance in graphs. Networks, 
23(1993), 135-142. 

[4] F. Harary and J.P. Hayes, Node fault tolerance in graphs. Networks, 
27(1996), 19-23. 

[5] P. Horak and J. Sir ah. On a construction of Thomassen, Graphs and Gom- 
binatories, 2(1986), 347-350. 

[6] K. Mukhopadhyaya and B.P. Sinha, Hamiltonian graphs with minimum 
number of edges for fault-tolerant topologies. Information Proeessing Let- 
ters, 44(1992), 95-99. 

[7] L. Stacho, Maximally non- hamiltonian graphs of girth 7, Graphs and Gom- 
binatories, 12(1996), 361-371. 

[ 8 ] C. Thomassen, Hypohamiltonian and hypotraceable graphs, Diserete Math., 
9(1974), 91-96. 

[9] J.J. Wang, C.N. Hung, and L.H. Hsu, Optimal 1-hamiltonian graphs, ac- 
cepted by Information Proeessing Letters. 



A Linear-Time Algorithm for Constructing an 
Optimal Node-Search Strategy of a Tree 



Sheng-Lung Peng^, Chin- Wen Ho^, Tsan-sheng Hsu^, Ming-Tat Ko^, and 

Chuan Yi Tang^ 



^ National Tsing Hua University, T^^i wan 
^ Academia Sinica, Taiwan 
^ National Central University, Taiwan 



Abstract. Ellis et al.^ proposed algorithms (in terms of vertex separa- 
tion) to compute the node-search number of an n- vert ex tree T in 0{n) 
time and to construct an optimal node-search strategy of T in 0(n log n) 
time. An open problem is whether the latter can also be done in linear 
time. In this paper, we solve this open problem by exploring fundamental 
graph theoretical properties. 



1 Introduction 

The graph searching problem was first proposed by Parsons [22]. In the original 
version, it is called edge searching^ an undirected graph is considered as a system 
of tunnels in which all the tunnels are initially contaminated by a gas. There 
are three kinds of moves in edge searching: (1) placing a searcher on a vertex; 
(2) removing a searcher from a vertex; and (3) moving a searcher from one 
vertex to another along an edge. A contaminated edge is cleared if move (3) 
is applied. Another version is called node searching^ which was proposed by 
Kirousis and Papadimitriou [16]. In node searching, the move (3) is not allowed 
and a contaminated edge is cleared if both its two endpoints simultaneously 
contain searchers. The objective of graph searching problem is to obtain a state 
of the graph in which all the edges are simultaneously cleared by a sequence of 
allowed moves using the least number of searchers. The graph searching problem 
is not only interesting theoretically, but also have applications on combinatorial 
problems [2,12,15,17,20,25]. 

A search strategy is a sequence of allowed moves to clear the initial contam- 
inated graph. A search strategy is optimal if it uses the minimum number of 
searchers among all possible search strategies. The number of searchers needed 
in an optimal search strategy of G is called the search number of G. In node 
searching, we call it the node-search number of G and denote it as ns[G). A 
cleared edge may be recontaminated if there is a path from a contaminated edge 
to the cleared edge without any searcher on its vertices. It has been shown in 
[16] and [3] that there always exists an optimal node-search strategy for a graph 
that does not recontaminate any edge. In the rest of paper, we only consider the 
node-search strategies which do not recontaminate any edge. 
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The node searching problem attracted the attention of the theoretical com- 
puter science community because of its equivalences with different seemingly un- 
related problems. For example, the following numbers coincide: the node-search 
number of G, the interval thickness of G [15], the pathwidth of G plus one [25,20], 
and the vertex separation number of G plus one [16,12]. Several excellent surveys 
for these problems can be found in [2,4,10,20]. The node searching problem is 
NP-complete on planar graphs with vertex degree at most three [21], chordal 
graphs [11], bipartite graphs [13], cobipartite graphs [1], and bipartite distance- 
hereditary graphs [14]. For some special classes of graphs, it can be solved in 
polynomial time [5,6,7,8,9,11,20,24,26]. 

For trees, the following results are known. Ellis et al. [9] proposed a linear- 
time algorithm (in terms of vertex separation) to find the node-search number 
of a tree T . However, this algorithm cannot be used to construct, in linear time, 
an optimal node-search strategy of T. Scheffier [26] presented a fairly compli- 
cate linear-time algorithm (in terms of pathwidth) to compute the node-search 
number of T and she claimed that a linear-time algorithm for constructing an 
optimal node-search strategy ofT easily followed. Unfortunately, few details are 
recorded in the literature. 

Megiddo et al. [19] proposed a linear-time algorithm to find the edge-search 
number of a tree T. They also gave an algorithm to construct an optimal edge- 
search strategy of T. However, this algorithm runs in 0(n log n) time where n 
denotes the number of vertices ofT. Peng et al. [23] showed that an optimal edge- 
search strategy of T can be obtained in linear time if an optimal node-search 
strategy of T is given. 

Peng et al. [23] proposed the concept of the avenue on a tree for node search- 
ing. In this paper we extend this avenue concept of a tree to an extended avenue 
system and an avenue tree. Based on these structures, we design a linear-time 
algorithm to construct an optimal node-search strategy of a tree. 

2 The Extended avenue system and the avenue tree 

2.1 The Extended avenue system 

Let T be an unrooted tree, and let V {T) and E{T) denote the vertex and edge 
sets of respectively. For a vertex t G U(2'), a connected component of T\{t} 
is called a branch of T at t. Let v be adjacent to t in T. The branch of T oX t 
containing v is denoted as Tt^. 

Let U be a path in T. Given a vertex x G U(U) and an edge (x,y) G 
Txy is a path branch at P if y eV{P) and a nonpath branch at P if y ^ T(P). If 
\V (2') I > 1, then ns{T) > 2. We define ns{T) = 1 if |U(2’)| = 1. Thus, ns{T) > 2 
if and only if there exists a vertex t G U [T) with at least one branch. 

Lemma 1. [9,26] For any tree T, ns{T) > k -\~ f for k > 2 if and only if there 
exists a vertex t G V {T) with at least three branches Pu, Pv, o,nd Pyj whose 
node-search numbers are all at least k. For any tree T, ns{T) >2 if and only if 
there exists a vertex t G V {[F] with at least one branch. 
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Using Lemma 1, Peng et al [23] proposed the concept of an avenue of a tree 
in node searching. A path F is an avenue of if the following conditions hold: 

1. for every path branch at F^ ns[T^) = ns[T)] and 

2. for every nonpath branch at F^ ns[T^) < ns[T). 

Note that if | V [F) | = 1, then F is also called a hub. When T has a hub as an 
avenue, it may have another vertex to be a hub. In general, a hub is not unique. 
Peng et al. showed that for any tree T has either a hub or a unique avenue 
[23]. In fact, if T has an avenue having at least two vertices, then the avenue is 
induced by edges (x,y) such that ns[Txy) = ns[Tyx) = ns[T). Now we extend 
the concept of an avenue as follows. A path F is an extended avenue of a tree 

if the following conditions hold: 

1. for every path branch at F^ ns[T^) < ns[T)] and 

2. for every nonpath branch at F^ ns[T') < ns[T). 

Note that an avenue of T is also an extended avenue of T. Let F be an 
extended avenue of T. Then F must contain an avenue of T. Path branches 
(respectively, nonpath branches) at F are also called the avenue hranehes (re- 
spectively, nonavenue branehes) at F. In the following, we define tag values on 
vertices of T, Al(T), and F{A{T) recursively as follows. 

1. lfV{T) = {v}A^eii 

(a) tag{y) = 1; 

(b) AO') = {M}; 

(c) T{AO')) = m- 

2. If I U (2') I > 1, let be an extended avenue of T. Let Tp denote the set 

{r\r is a connected component after removing F from 2 '}, then 

(a) for all v G V[F)^ tag{y) = ns(2'); 

(b) AO') = {P} U {UT'eTpAO''))', 

(c) pAO')) = {T}u{u,.,rpP{AO'')))- 

As defined above, A{T) is a set of vertex disjoint paths of T. We call A{T) 
an extended avenue system ofT. The set F{A{^F) ) is corresponding to Al('i’). By 
definition, for each path F G A{T)^ there exists a tree G F{A{T)) such that 
F is an extended avenue of Th Hence |Al('i’)| = \ J~{A{T))\. For convenience, we 
denote hj Bp in the rest of paper. Let F be any maximal path induced by 
vertices with the same tag value in T. Then F G A{T). In other words, A{T) 
can also be determined according to the tag values of all vertices ofT. Note that 
once A{T) is defined, the tag value of every vertex of T and F{A{T)) are then 
well defined and we will refer them without specifying. Since an extended avenue 
of T is not unique, A{T) is also not unique. 

By definition, all the vertices in a path F G A{T) are with the same tag 
value. Thus we may define the tag value of F in A{T) being the tag value of a 
vertex of F^ i.e., tag[F) = tag{y) for v G V{F). Note that if tag[F) = 1, then 
|y(P)| = 1. For simplicity, all the paths in A{T) are called avenues. Especially, 
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the path in A{T) with the tag value ns[T) is called the main avenue of T with 
respect to A{T). Note that if F E A{T) is the main avenue of then Bp = T, 
For any two avenues F and Q in A{T)^ let Cpq denote the eonneeting path 
of F and Q which is the path [w, tci, , tCg, i;] in T such that u E V[F)^ 
V E V (Q), and Wi ^ V [F) U V (Q) for all i. Every vertex 1 < i < s, is called 
an internal vertex of Cpq. Since 7’ is a tree, there is exactly one connecting path 
that connects any two avenues of A{T). By definition of A{T)^ It is not hard to 
see that for any two avenues with the same tag value s in A{T), s < ns{T), their 
connecting path contains at least one vertex whose tag value is greater than s. 
For each F E A{T)^ let Ap = {Q E A{T)\tag{F) < tag{Q) and for each 
internal vertex v E V{Cpq), tag{v) < tag{F)}, Note that if F is the main 
avenue of then Ap = 0; otherwise Ap ^ 0. It is not hard to show that for 
any avenue F E A{T) which is not the main avenue of all the tag values of 
the avenues in Ap are distinct. 



2.2 The Avenue tree 

We use the notation ’’u — n” in a rooted tree to denote the parent of v is u. 
With respect to an extended avenue system A('i'), an avenue tree which 

is a rooted tree, is defined as follows. 

1. If T is an isolated vertex u, then v itself is the root of 

2. If 7' is not an isolated vertex, then let F = ^ A{T) be the 

main avenue of T and let Fp be the set of all the nonavenue branches at F, 
We construct as follows. 

(a) We arbitrarily choose a vertex Vi from V (F) as the root of 

(b) Let Vj for 1 ^ j < i — F 

(c) Let Vj '^j-i for i + 1 < j < r. 

(d) For each T' E T p^ let w be the root of C{^F') and assume is a non avenue 

branch at Vj E V [F)] then let u ^ Vj, 

Since the root of F('i') is arbitrarily chosen, F('i') is not unique. A rooted 
tree is called a rooted path if the underlying unrooted tree is a path. From the 
definition of F('i'), every avenue in A{T) is a rooted path in In F('i'), the 

tag value of every internal vertex is greater than 1 and every vertex with tag 
value 1 is a leaf. For any avenue F E A{T)^ tag[F) > 1, and a vertex v E F(P), 

if i; is a leaf of F[T) then (i) i; is not the root of F in (ii) v is an endpoint 

of F] and (hi) there is no nonavenue branch at v in Bp, For u^v E V [£{T)) and 
V ^ u m F('i'), let F^Q E A{T) be the avenues containing v and n, respectively. 
In the case of tag{y) < tag{u)^ we call Q the parent of F in F[T), We define 
min(Ap) to be the avenue in Ap with the smallest tag value. 

Lemma 2. For Q E A{T), Bp is a nonavenue hraneh at Q in Bq if and only 
if Q = min(Ap). 

Lemma 3. Let F E A{T) whieh is not the main avenue of T . Then in any 
T{fF), the parent of F is Q if and only if Q = min(^p). 
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Lemma 4. Let Ai^r) he an extended avenue system of T and 7'* be a rooted 
tree with = V{T), 7’* is an avenue tree of T with respeet to A{T) if 

satisfies the following eonditions. 

1. For eaeh F C A{T), V {F) induees a rooted path in T*. 

2. For eaeh F G A{T), tag{F) < ns{T), let Q = min(Al/j) and Bp be a 
nonavenue hraneh at v e V{Q)- Then v is the parent of the root of F in T* . 

Given an A{T) and its one corresponding avenue tree we design the 

following algorithm to construct an optimal node-search strategy of T , 

procedure SEARCH(i;); {v is the root of £(T)} 

Let (vi, V 2 , , Vr) be the sequence such that tag{vi) = tag(v)] 

for i := 1 to r do begin 

place a searcher on 

if > 1 then remove the searcher from 

for all children y of Vi with tag{y) < tag(v) do SEARCH(y) 

end; 

remove the searcher from Vr 
end SEARCH; 

Theorem 5. Given an avenue tree F{fF) rooted at v, Algorithm SEARCH{v) 
eonstruets an optimal node-seareh strategy of T in linear time. 

3 Constructing an optimal node-search strategy 

In the following, 7 As a rooted tree. The node-search number of a rooted tree is 
the same as the node-search number of its underlying unrooted tree. Let T[u] 
denote the subtree of T rooted at u. Let T\u^v\^V 2 ^ . . . ^Vj] denote the resulting 
tree of removing the subtrees rooted at v\ through Vi from T[u\. A vertex x is 
k-eritieal in a rooted tree T if ns[T[x]) = k and there are two children y and z 
of X such that ns[T[y]) = ns[T[z]) = k. 

For each vertex u G F('i'), the label of u is a list of integers (ai, . . . 

Rf > U 2 > • • • > Up > 1, and each is associated with a vertex Ui^ 1 < i < p^ 
such that the following conditions hold. 

1. ns{T[u]) = ai. 

2. For I < i < Py ns[T[u^ ui, . . . , n^]) = a^+i. 

3. For 1 < i < p, is an a^-critical vertex in T[u^ ui, . . . , n^-i]. 

4. Up is n. If Up is marked with a prime “ ^ ” then there is no Up -critical vertex in 

ui, . . . , Up-i]. If Up is not marked with a prime, then Up is an Up-critical 
vertex in T[u^ ui, . . . , Up-i]. 

A eritieal element is an element of the label that is associated with a critical 
vertex. Note that the prime marker is used only on the last element, since all 
the others are necessarily critical. 

Let / = [a, 6], where a> b and a, b are integers, represent the list of consecu- 
tive integers (a, a — 1, . . . , 6). Then the label A = (ai , . . . , Up) can be represented 
as a list of intervals (ii, i 2 , • • • , where R represents a maximal set of consec- 
utive integers in A for all 1 < i < r [9]. In A’s integer representation (ai, . . . , Up), 
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let Ui be the vertex associated with for 1 < i < p. In A’s interval representa- 
tion only the vertices associated with the endpoints of intervals 

are recorded. 

For convenience, a label is also treated as a set. For a label A and an integer 
in the interval representation, we say k E Xif there exists an interval [a, 6] G A 
such thsit a > k > b. Let cwt(A, k) be the resulting label by deleting all elements 
which are no greater than k from A. Let min (A) (respectively, max(A)) be the 
smallest (respectively, largest) element of label A. For any two trees = ( Fi , Ei ) 
and 7 2 = ( F 2 , F 2 ) , 7 1 U 7 2 = ( Fi U F 2 7 Fh U E 2 ) . 

For a subtree T[u] we first compute the label A of w and then decide the 
tag value of u according to A. By the tag value of u and the labels of its children, 
an avenue tree £{T[u]) can be constructed. Let denote the concatenating op- 
eration in label combination. For simplicity, we only define the concatenating op- 
eration in integer representation. In our algorithm, the concatenation A = A1&A2 
only occurs when min(Ai) > max(A2). If min(Ai) > max(A2), then A = Ai U A2; 
otherwise we recursively compute A = cwt(Ai, max(A2))&(max(A2) + !)• The de- 
tails of our algorithm are as follows. 



function AVTREE(T: unrooted tree): rooted tree; 

Initially, V = V (7’) and for each vertex n C V (7’*), n — > n in 7’*; 

Choose some vertex n in T and make u the root of T ; 

A := 0; /=t= A is the label of u and, initially, it is empty =i=/ 

BUILD_AVTREE(T, n, A, T*); 

AVTREE := T* 
end AVTREE; 

procedure BUILD _AVTREE(T: tree; u: vertex; A: label; T* : tree); 
if u is the only vertex in the tree T[u] then 
A := ([!'’, 1'"]) and ntag{u) = 1 

else 

for all vertices 1 < < 7, the d children of u, do 

Let T* be the subforest of T* induced by V(T[vi])] 

Ai := 0; /=t= Ai is the label of Vi =t=/ 

BUILD_AVTREE(7’, Vi,T*,Xi) 

endfor; 

A := COMBINEXABELS(Ai, As, . . . , A^); 

ntag{u) := min(A); 

7 * := COMBINE_AVTREES(n, Al, . . . , Ad,7’*, . . . ,7;^*) 

endif 

end BUILD .AVTREE; 

function COMBINEXABELS(Ai , As , • • • , A^ : label): label; 

if there is one or more label containing 1 then A := ([2^2^]) else A := ([l\ 1^]); 
Let m be the second largest element in {max(Ai), max(As), . . . , max(Aci)}; 

for A: := 2 to m do 

Let n be the number of labels containing an element k] 

Case 1: {n > 3} A := {[k -|- 1'’, A: -|- I'"]); 

Case 2: {n = 2 and at least one element k is critical} A := ([A: -|- iLA: -|- I'"]); 

Case 3: {n = 2 and neither element k is critical} A := ([A:, A:]); 

Case 4: {n = 1 and element k is critical and k G A} A := ([A: -|- , A: -|- 1^]); 

Case 5: {n = 1 and element k is critical and not (k G A)} A := ([A:, k])8zX] 

Case 6: {n = 1 and element k is not critical} A := {[kk k']) 
endfor; 

Let As be the label with max(Ag) > m and fj, = cut(Xs , m)-, 

COMBINEXABELS := jU&A 
end COMBINE.LABELS; 

function COMBINE_AVTREES(n: vertex; Ai, . . . , A^: labels; 7}*, . . • , 7’^: trees): tree; 
7’* := U {the rooted tree consists of the single vertex n}; 

Let m be the second largest element in {max(Ai), max(A 2 ), . . . , max(Ad)}; 
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for A: = 1 to m do 

for each \i with k E \i do 

Case 1: {A: < ntag(u) and A: is a left endpoint of A^} 

/* this case decides the children of */ 

Let V be the vertex associated with k and n — > re; 
if ntag{w) > ntag{u) or re = n then v — > u\ u = u\ 

Case 2: {A: = ntag{u) and k is an endpoint of A^} 

/=!= this case decides the rooted path (rr is the root) =i=/ 

Let V be the vertex associated with k and -e — > re; 

Case 2.1: {A: is a left endpoint} -e — > rr; 

Case 2.2: {A: is a right endpoint} -e — > rr; rr — > re; 

Case 3: {A: > ntag{u) and k is an endpoint of A^} 

/=!= this case decides the ancestors of rr =i=/ 

Let V be the vertex associated with k\ 

Case 3.1: {A: is a left and right endpoint} u — > -e; rr'" = -e; 

Case 3.2: {A: is a right endpoint} u — > n; 

Case 3.3: {A: is a left endpoint} u = v 

endfor 

endfor; 

Let As be the label with max(As) > m and = (. . . , -Ar-i , A) = cut(Xs , m); 

A: if min(jn) is a right endpoint of Ag then 

/=t= the remaining work of Case 3 =i=/ 

Let V be the vertex associated with min(jn); 

u — > V 

B: else /* m G Ag, this implies ntag{u) > m 

/=!= decide the children of rr =i=/ 

Let V be the vertex associated with the left endpoint of A; 

V — > u\ 

C: if r > 1 then 

Let re be the vertex associated with the right endpoint of ir-ii 
rr — > re 

endif 

endif; 

COMBINE_AVTREES := 7 * 
end COMBINE_AVTREES; 

The function COMBINE-LABELS is presented in [9] which computes the 
labels correctly. As proved in [9], using the interval representation of labels, 
ns[T) can be computed in linear time. 

Lemma 6. Eor any vertex u e ^ {T), let (ai, . . . , a^) be the label of u and Ui be 
the vertex assoeiated with for I < i < p- Let denote the tree T[v] and 
denote the subtree T[u,ui , . . . ,n^] for 1 < i < p. Then the following statements 
hold. 

1. Eor eaeh eritieal Ui, 1 < i < p, Ui is in any extended avenue of . 

2. is a nonavenue braneh at Ui inT''~^ , for 1 < i < p. 

3. Let e* = r-^\r, l<i<p. Then A{T°) = U A{TP-^). 

Lemma 7. Let u be a vertex in the rooted tree T with the label A. Let = 
min(A). Let Vi with the label Xi, 1 < i < d, be the d ehildren of u. Eor eaeh i, 
1 <i < d, let TVi = cut{Xi,a^). Then the following statements hold. 

L TVi, I < i < d, are mutually disjoint. 

2. cut{X,ap) = uf^iTVi. 



Lemma 8. The ntag values eomputed by Algorithm AVTREE{T) determine 
an extended avenue system of T. 
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Lemma 9. Let Ai^r) he the avenue system of T determined by AVTREE{T) 
and u he the root of T . Let the label of u be X = {ai, . . . , a^) and Ui be the vertex 
assoeiated with for 1 < i < P- Let Fi G A{fE) be the avenue eontaining Ui. 
Then Pi-i = 

Lemma 10. Algorithm AVTREE{T) eonstruets an avenue tree T* of T . 
Proof: It suffices to prove that 7'* constructed by AVTREE[T) satisfies the 
conditions in Lemma 4. We consider 7' as a rooted tree with root at u. We prove 
this lemma by induction on the height of T, If T is of height 1, then it is a 
star graph. In AVTREE[T)^ since ntag{u) = 2 and every child of u has label 
([L, L]), only Case 1 of COMBINE^AVTREES is executed. Thus every child v of 
Uj u — ^ w. It is easy to verify that T* satisfies the conditions of Lemma 4. Assume 
that for all rooted tree T of height less than kphe rooted tree 7'* constructed by 
AVTREE[T) satisfies the conditions of Lemma 4. Now we consider a tree T of 
height k. Let the label of n be A = (ai, . . . , a^) and Ui be the vertex associated 
with ai for 1 < i < p. Let Vi with the label . . . , uypj, 1 < i < d, be 

the children of u. Let uyj be the vertex associated with ai^j for all 1 < i < d and 
1 < 4 < Pi^ 

By Lemma 8, the ntag values computed by AVTREE[T) determine an ex- 
tended avenue system AifE) of T. Let Q G A{T) be the avenue containing u. 
Let F G A{T) be any avenue not containing u. Thus, F is in T[vt] for some 
1 < t < d and F must be in . . . , or T[ut^s] for some s, 

1 < s < Pi . By induction hypothesis on T[vi\^ P is a rooted path in T[viY . 
Hence F is also a rooted path in T*. That is, F satisfies Lemma 4(1) in T*. In 
the following, we show that F satisfies Lemma 4(2) in d'*. 

We first consider the case that F does not contain any Ut^si ^ s < pt. Let 
F be contained in T[ut^s] for some s. Since ntag[F) < at^si all the avenues in 
Ap is contained in T\ut^s]- That is, min (Alp) is also in T\ut^s]- By the induction 
hypothesis, F satisfies Lemma 4(2) in T[vt]* . Since the pointer of the root of F 
does not change in COMBINE^AVTREES ^ P satisfies Lemma 4(2) in d'*. 

Next, we consider the case that P contains Ut^s for some s. Let d4,s denote 
the avenue in Al(d') containing Ut^s^ 

Case 1: ntag{Pi^s) < ntag[u). In this case, Alp^ ^ contains Q and every avenue 
in Alpi 3 except Q is contained in T[vt]. Let be the largest element in Xt 
such that at^s' < tag[Q). Then min(Alp^ ^ ), s Y is contained in T[vt\. For Ft^sy 
s Y by the induction hypothesis on T[vt]^ Ft^s satisfies Lemma 4(2) in T[vi]*, 
The root of Ft^s has pointed to the right vertex of min(Alp^ 3 ) in T[vt\*. Since 
the pointer does not change in COMBINE_AVTREES^ Ft^s satisfies Lemma 4(2) 
in T*. For Ft^s'^ since > ntag[Q)^ Q = min(Alp^^,). In Case 1 or B: of 

COMBINE_AVTREES ^ the parent of the root of Fi s> is assigned to be w. Hence 
Ft^s' satisfies Lemma 4(2) in d'*. 

Case 2: ntag{Fi^s) > ntag[u). Since ntag{u) = a^, Ut^s ^ • • • y'^p-i} by 

Lemma 7. Let ut^s — By Lemma 9, the parent of up is Uk-\, If ap (i.e., at^s) 
is not a left endpoint in A^, then ap_i{= F 1) G A^. That is, up_i is also 
contained in T[vf]. By Lemma 9, the avenue contains Uk-\ is min(Alp^^). By 
induction hypothesis on T[vf]^ Pt^s satisfies Lemma 4(2) in T[vfY . The root of 
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Ft^s has pointed to the right vertex of min(Al/j^ 3 ) in T[vt\^ . Since the pointer 
does not change in COMBINE_AVTREES ^ Ft^s satisfies Lemma 4(2) in T*. If 
ak is a left endpoint in A^, then by Lemma 7 ak-i must be a right endpoint 
in some Xi^ I < I ^ t < d. Then at^s is determined in Cases 3.1 or 3.3 of 
COMBINE_AVTREES and (i.e., ak-i) is determined in Cases 3.1, 3.2 or 
A: of COMBINE-AVTREES . The pointer ut^s is assigned in Cases 3.1, 

3.2 or A: of COMBINE.AVTREES. Hence Ft,s satisfies Lemma 4(2) in 

Finally we consider the avenue Q. There are six cases in determining of A 
in COMBINE_LABELS. Case 5 can not happen. In Cases 1, 2, and 4, since there 
is no child of u whose label contains = [u] and therefore Q is a rooted path. 

In Case 3, there are exact two children of n, say Vi and such that G A^, 
Up G Xj , and is not critical in A^ and Xj . Let Qi be the avenue containing Vi 
in A{T[vi]). Then V{Q) = V{Qi) UV{Qj) U {n}. By the induction hypothesis, 
the root of Qi is Vi. The Case 2 of COMBINE^AVTREES makes Q be a rooted 
path with root u in T*. In Case 6, there is exact one child Vt of n, say Vt^ such 
that Up G Xt and Up is not critical in A^. Let Qt be the avenue containing Vt in 
A{T[vt\). Then V[Q) = V[Qt) U {u}. By induction hypothesis on T[vt]^ vt is 
the root of Qt in T[vtY . In Case 2 of COMBINE_AVTREES^ the pointer of vt 
is assigned to u. Hence Q satisfies Lemma 4(1). Next, we prove that Q satisfies 
Lemma 4(2). We consider F being the avenue which contains Up_i. By Lemma 
9, /^ = min(AlQ). By Lemma 7, only one child of u whose label contains ap_i. 
If both Up and ap_i are contained in Xt for some t, then the pointer u Up-± 
in 7'* is done in Case 2.2 of COMBINE.AVTREES since u is the root of Q. 
Otherwise, ap_i is a right endpoint in A^. The pointer u ^ Up-i in T* is done in 
one of the Cases 3.1, 3.2, A:, and C: of COMBINE_AVTREES. Hence Q satisfies 
Lemma 4(2) in 7'*. Thus, our lemma follows. □ 

It can be verified that the time complexities of COMBINE.AVTREES and 
COMBINE.LABELS have the same order. Since COMBINE.AVTREES only 
uses the information of interval representation of labels, the time complexity 
of Algorithm SEARCH[v) is linear. Together with Theorem 5, we have the 
following theorem. 

Theorem 11. An optimal node- search strategy of a tree can he constructed in 
linear time. 
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Abstract. This paper studies a graph optimization problem occurring 
in virtual colonoscopy, which concerns finding the central path of a colon 
model created from helical computed tomography (CT) image data. The 
central path is an essential aid for navigating through complex anatomy 
such as colon. Recently, Ge et al. [GSZ^] devised an efficient method for 
finding the central path of a colon. The method first generates colon data 
from a helical CT data volume by image segmentation. It then generates 
a 3D skeleton of the colon. In the ideal situation, namely, if the skeleton 
does not contain branches, the skeleton will be the desired central path. 
However, almost always the skeleton contains extra branches caused by 
holes in the colon model, which are artifacts produced during image seg- 
mentation. To remove false branches, we formulate a graph optimization 
problem and justify that the solution of the optimization problem repre- 
sents the accurate central path of a colon. We then provide an efficient 
algorithm for solving the problem. 



1 Introduction 

Virtual endoscopy is a new medical technology that allows physicians to examine 
computer simulations of patients’ anatomy rendered from CT scans. It combines 
medicine, clinical experience, radiology, image processing, computer algorithms, 
and applied mathematics to provide the public with alternative medical proce- 
dures that are less painful, less costly, and less risky compared to conventional 
endoscopic procedures. For instance, medical research has shown that small colon 
polyps, the precursor to colon cancer, can be detected with virtual endoscopy 
[VGB94,VS94]. 

When applying virtual endoscopy to complex anatomy such as colon, users 
often find it difficult to keep track of their position and orientation inside the 
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complex colon image. As a consequence, a part of colon lumen may be left with- 
out inspection. Hence, one would like to have a tour guide for traveling through 
a virtual colon. Finding the central path through the lumen of a colon provides 
a natural solution. The central path can be used, for example, to create a movie 
that displays the internal views of the colon lumen generated automatically along 
the path; and to guide the user to walk through, by using a computer mouse, 
the colon lumen without getting lost in the virtual space. In addition to aiding 
navigation, the central path also provides a vital component for more advanced 
processing and visualization of the virtual anatomy. For example, one can slice 
the virtual colon into segments of similar length and split each segment into two 
halves based on a curvilinear cutting plane that passes through the central path, 
which provides a clear visualization with a single view [VSH'^96]. 



Several methods have been proposed to determine the central path of a vir- 
tual anatomy. One approach requires that users manually select a number of 
points along the pathway for constructing a central path 
[VS94,HJR+96,SAN+96]. Using this method, users often need to spend con- 
siderable amount of time to explore and understand the data volume prior to 
placing the points, which makes the method less desirable for routine clinical 
applications. Another approach offers automatic methods [HKW95,LJK97], but 
these methods can only find a central path (or a path closed to the center) in the 
colon lumen for limited cases. These methods fail when the colons have complex 
shapes. In some cases, a part of the small bowel may also be included in the 
colon data, which makes the case even more difficult to analyze. To overcome 
these obstacles, Ge et al. [GSZ+] recently devised an efficient method based on 
the concepts of skeletons. A skeleton of a 3D object is the locus of the centers 
of the largest balls that can fit inside the object; such balls are referred to as 
maximally inserihed balls [Blu73]. Ge et al.’s method consists of four steps. First, 
it generates a 3D image data volume of a colon from helical CT scans by image 
segmentation. Second, it generates a 3D skeleton of the colon image by using an 
improved thinning algorithm based on Li et al.’s thinning algorithm [LKC94]. 
The thinning process preserves the topological constraints and the geometric 
constraints of the original object. (The geometric constraints are the two end- 
points of the colon provided by the user.) This means that if the skeleton does 
not contain branches, then it is the desired central path. However, almost al- 
ways the skeleton contains extra branches caused by holes in the object that 
are artifacts produced during image segmentation. The number of holes may be 
reduced by a finer segmentation; but no segmentation, however fine, seems likely 
to eliminate holes completely. Hence, false branches need to be removed, which 
is the task of the third step of the method. Note that the image segmentation 
process may also produce cavities, which corresponds to an isolated point on the 
skeleton, and hence can be removed easily. After the false branches have been 
pruned, the remaining skeleton provides the accurate central path. However, the 
path contains many abrupt direction changes due to the discrete nature of im- 
age data. The last step of the method computes a smooth representation of the 
central path by approximating the final skeleton with B-splines. 
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To remove the false branches of the skeleton in Step 3, the skeleton is first 
converted to a connected graph with positive weights, referred to as a skeletal 
graph. Here by connected graph we mean that the graph does not contain isolated 
vertices. In the skeletal graph, a vertex corresponds to a point on the skeleton 
where either three or more branches join, or only one branch joins (in this case, 
the point is an endpoint of the colon), and an edge corresponds to a branch. The 
weight of each edge is determined as follows. Note that each point on a skeleton 
branch is associated with a maximally inscribed ball. It has been observed that 
holes are almost always near the surface [GSZ^], and so the extra branches that 
are induced by these holes usually pass through narrow segments. In other words, 
the desired central path lies in the center of the colon lumen with a relatively 
large diameter. Imagine that each point on the skeleton corresponds to a pipe 
whose size is the maximally inscribed ball. A skeletal branch can then be viewed 
as a connected sequence of pipes in which balls can roll back and forth. The 
smallest pipe along the branch determines the largest ball that can pass through 
this branch. Thus, the radius of the smallest ball along a skeleton branch is used 
as the weight of its corresponding edge in the graph. In case there are multiple 
edges between two vertices (this occurs when a skeletal branch splits into two to 
go around a hole), the edges with smaller weights are removed. 

The skeletal graph of a colon has two endpoints. We want to find a path 
from one endpoint to the other endpoint in the skeletal graph such that its 
minimum weight is the maximum among all paths, and it avoids passing through 
any narrow passage whenever it is possible. Such a path corresponds to the true 
central path of the colon. Ge et al. [GSZ+] formulated this optimization problem 
as a problem of finding a path with the maximum flow^ where the flow of a path 
is defined as the minimum weight of all edges on the path. However, in some 
cases, there may be more than one path with the maximum flow. In such a case, 
a heuristic was used in [GSZ+] for finding a central path among all paths with 
the maximum flow. We refine this formulation in this paper and fully justify that 
the solution of the refined optimization problem represents the accurate central 
path. These results are given in Section 2. In Section 3, we present a 0(n log n)- 
time algorithm for solving the optimization problem on skeletal graphs, as well 
as mathematical analysis of the algorithm, where n is the number of vertices 
in a graph. In Appendix, we present a running example of the algorithm on a 
complex colon case. 

2 Finding the Central Path 

We consider connected graphs with positive weights. Since an undirected graph 
can be simulated by a directed graph, where each edge is represented by two arcs 
traveling in opposite directions, we assume that all graphs are directed. Denote 
by tc(w, v) the weight of an edge from u to v. 

Let G be a skeletal graph (i.e., the weighted graph converted from a skeleton 
as described in the second last paragraph in Section 1). Let s and t be the two 
endpoints in G. In what follows, we will fix the usage of s and t to denote the 



292 Jie Wang and Yaorong Ge 



two endpoints of the colon image. To search for the central path from s to t, 
we should avoid selecting branches with small weights whenever it is possible. 
There are two subtle issues in formulating a mathematical definition to capture 
the essential features of the central path, and we discuss them below. 

Issue 1: False branches of large weights. In searching for the central path 
from s to t, one may want to use the following greedy strategy; namely, at a 
current vertex n, select the next vertex v with the largest tc(w, v). This strategy 
works fine if all false branches from vertex u have strictly smaller weights than 
the weight of the true branch from u (here by a true branch it means the branch 
on the central path). But this strategy fails if a false branch actually has a larger 
weight than that of the true branch on a particular segment, which may occur 
due to limitations of segmentation. Although such cases do not always occur, 
we have encountered such cases during our experiments. Hence, if this greedy 
strategy is used, then a false branch will be selected, which will then lead us to 
other false branches. 

One may then perhaps suggest that, if such a case happens, one should try 
to run a finer segmentation and start the whole process over again. However, 
the segmentation process and the thinning algorithm are extremely time con- 
suming because a large data volume is involved: Each helical CT volume may 
contain up to 200MB of data. Hence, this approach is not desirable. Compared 
to the large data volume of the original image, the size of the skeletal graph is 
substantially smaller; naturally we should avoid re-running the whole process. 
Note that selecting a false branch with a large weight may lead to another false 
branch, which may lead to a false path with a smaller flow than the maximum 
flow. Hence, we want to And a path with the maximum flow. We note that there 
may be two or more such paths. If x is a common vertex occurred on such paths, 
the one that has the largest weight from s to x should be selected. 

Let p = {vq^vi^ . . . ^Vk) be a simple path. Denote by f[p) the flow of p; 
namely, 

f{p) = : 1 < i < A;}. 

We use Vi X, Vj [j > i) to denote the portion of p from V{ to Vj, i.e., the path 

{Vi,...,Vj). 

Definition 1. Let G he a weighted graph. Let u and v he two vertices. A simple 
path p from u to v in G is a largest path if for any other path p^ from u to v, 
the following two conditions hold. 

1- f{p') < f{p)- 

2. If X is a vertex shared by both p and p', then f{u V x) < f{u-X x). 

Issue 2: True branches of fluctuating weights. If there is only one largest 
path from s to t, then that path represents the accurate central path. However, 
in some cases, there are two or more largest paths. For example, imagine that a 
colon image contains some segments that are narrower than the other segments. 
Then the narrowest segment may determine the flow of the central path. If after 



A Graph Optimization Problem in Virtual Colonoscopy 293 



this narrowest segment the colon image (obtained from image segmentation) 
becomes substantially larger, then a false path may also have the same flow as 
the flow of the central path, because the maximum flow of all paths is already 
small. Several heuristics may be used to help identify the accurate central path 
among the largest paths. For example, one may suggest to And a largest path p 
from s to t such that for any other largest path p^ from s to t, /(p() < f{pi) for 
all i, where pi (respectively, p^-) represents the portion on p from the ith vertex 
to t. This heuristic works for some cases, but it may fail if the colon image has 
a narrow segment that is followed by a wide segment, and then followed by a 
narrow segment. 

Let Cl and 62 be two edges on a path. Denote by ei < 62 if ei is reached first 
before 62 is reached. If ei, 62, and 63 are three edges on a path with ei < 62 < 63 
such that tc(ei) < 10(02), and 10(02) > ^0(03), then we say that the path has 
fluctuating weights. 

The following is a possible heuristic to handle fluctuating weights; namely. 
And a largest path p from s to t such that for any largest path from s to t, if 
there is an edge oi on p and an edge e[ on p^ with 10 (o^^^) > lo(oi), then p must 
contain an edge 02 > oi such that for some edge 02 > e[ on p\ 10(02) > ^(^2)* 
This heuristic works for some cases, but it may fail in some other cases. For 
example, assume that 0i and 02 are the two edges at the end on the true central 
path, and o is an edge at the end of a false path such that lo(o) > lo(oi) and 
lo(o) > 10(02), then the true central path will not be selected following this 
strategy. 

To overcome these obstacles, let us imagine that the skeleton of a colon 
represents the density of the colon. In the ideal situation, namely, if at any 
point, all false branches have smaller weights than the true branch, then the 
central path has the heaviest average weight. An average weight of a path is the 
total weight of all edges divided by the number of edges on the path. Note that 
we cannot use the maximum total weight as a criterion because a false branch 
may be long and hence its total weight may be large. In a skeletal graph, almost 
all false branches have small weights. So even if on some segments of a colon, a 
false branch has a larger weight than that of the true branch, the central path 
must still have the heaviest average weight. 

Let p = {vo^vi^ . . . ^Vk) he n simple path. Denote by W [p) the average weight 
of p] namely, 

i=0 

Definition 2. Let G he a weighted graph. Let u and v he two vertices. A simple 
path from u to v in G is a heaviest path if for any other path p' from u to v, 
W{p^) <W{p). 

In some cases a path with the heaviest weight may imply that it is also a 
largest path. But it is not always true because a heaviest path may contain an 
edge with very small weight. Hence, we want to find a path that is the heaviest 
among the largest paths. This gives rise to the following definition. 
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Definition 3. Let G he a weighted graph. Let u and v he two vertices. A simple 
path p from u to v in G is a critical path if p is a largest path from u to v, and 
for any largest path p^ from u to v, W {p^) < W {p). 

It is easy to see that if there is a path from u to then there must be a crit- 
ical path from u to v. Recall that for any vertex w in a skeletal graph G, almost 
always false branches from u have strictly smaller weights than the true branch 
from u. This implies that if a path is the heaviest among the largest paths, then 
it represents the accurate central path. Any reasonable segmentation guarantees 
that such a path is unique. (But we note that in a general weighted graph, there 
may be more than one critical path.) In practical terms, the central path can be 
found by solving the following optimization problem. 

Central Path Problem 

Input: A weighted graph G and two vertices s and t. 

Output: A critical path from s to t. 

We present a fast algorithm for solving this problem in the next section. 



3 A Fast Algorithm 



Let G = {V^ E) be a connected graph with positive weights. Let s and t be two 
vertices. We want to find a critical path from s to t. Let 



Z\(n, v) 
T(n, v) 



max{/(p) : v} /it there is a path from u to v 

— 1, otherwise 

max{IT(p) : V and f{p) = Z\(s,n)}, if there is a path from 

u to V 

0, otherwise 



Lemma 1. Let G he a weighted graph. Let u and v he two vertices. Then a 
path p from u to v is a critical path if and only if for every vertex x on p, 
f{u ^ x) = Z\(n, x) and W{u^ x) = T(n, x). 

We observe that in a skeletal graph, the number of edges is in the same 
asymptotic order of the number of vertices. Hence, we use an adjacency list Adj 
to represent G for saving memory space. Note that although G does not contain 
isolated vertices, it does not mean that for any pair of vertices u and i;, there 
always exists a path from n to i;. But in a skeletal graph G, there is always a 
path from s to n for any vertex u. We present an algorithm that can also be 
used to handle non-connected graphs. For each vertex n G F, we maintain four 
attributes d[u], 7r[u], /[u], and a[u] in the algorithm, where d[u] represents the 
flow from s to n, tv[u] returns the predecessor of u on the current path, l\u] 
represents the number of edges from s to u on the current path through the tt 
attributes, and a\u] represents the average weight of the current path from s 
to n. 
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Critical_Path(G, s, t) 

1 . for each vertex u E V do 

d[u] i 1, 7v[u] ^ NIL, l[u] ^ 0, a[u] ^ 0 

endfor 

d[s] ^ 0, A ^ 0, Q ^ V 

2. while Q 0 do 

(a) u ^ Extract_Max(Q) 

if d[u] ^ —I then S ^ SU {n} else goto Step 3 

(b) for each vertex v G Adj[u] do 

if {d[v] < mui{d[u]^w{u^v)}) then 
d[v] ^ mm{d[u]^w{u^v)} 

7v[v] ^ R, l[v] ^ I [u] -\- 1 
a[v] ^ (/[n] • a[u] -\- w[u^v)) /l[v] 
if ( 7t[i;] 7 ^ nil and mm{d[u]^w{u^v)} > d[v]) then 
if {{l[u] • a[u] + tc(n, i;)) /{l[u] + 1) > a[v]) then 
7t[i;] ^ R, l[v] ^ l[u]-\- 1 
a[v] ^ {l[u] • a[u] + tc(n, v))/l[v] 

endfor 

endwhile 

3. if t E S then output the path from s to t using the tt attributes 
else there is no path from s to t 

Part 1 of Critical_Path(G, s, t) is for initialization. The procedure 
Extract_Max(Q) in 2(a) finds an element u E Q that has the largest d[u]. 
For each vertex v in Adj[u]^ if v has not been visited, namely, 7v[v] = NIL, then 
the first if-statement of 2(b) updates d[v\. If v has been visited, then the second 
if-statement of 2(b) checks whether the current path has a flow at least as large 
as the flow of the old path. If the answer is yes, the third if-statement of 2(b) 
checks whether the current path has a strictly larger average weight than that 
of the old path; if the answer is yes, the old path is flipped over to the current 
path by changing the tt attribute. The formula (/[n] • a[u] + w{u^v)) /{l[u] + 1) 
calculates the average weight of the current path from s to v via n. 

Lemma 2. When the algorithm Critical_Path(G, s, t) terminates, we have 
d{v] = A{s,u) for all vertices u eV — {s}. Moreover, if r:\u] ^ NIL; the path p 
from s to u obtained from the tt attributes is a largest path. 

Lemma 3. When the algorithm Critical_Path(G, s, t) terminates, we have 
o\v] = r[s,u) for all vertices u eV — {s}. Moreover, if r:\u] ^ NIL; the path p 
from s to u obtained from the tv attributes is a critical path. 

It follows from Lemma 2 that the algorithm Critical_Path(G, s,f) returns 
a critical path from s to t. Namely, we have proven the following theorem. 

Theorem 1. Assume that there is a path from s to t in G, then when the al- 
gorithm Critical_Path(G, s, t) terminates, the path from s to t obtained from 
the TV attributes is a critical path. 
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Next, we analyze the time complexity of the algorithm. 

Theorem 2. Depending on how the ExtractJVIax operation is implemented, 
the algorithm Critic al_Path(G, s,t) runs in time O(n^) or 0((n+ \E\) logn), 
where \V\ = n. Henee, when \E\ = 0{n), the algorithm ean he run in 0(n log n) 
time. 

Since in a skeletal graph G, \E\ = 0(|Y|), it follows from Theorems 1 and 2 
that finding the central path of the colon lumen can be carried out in 0(n log n) 
time, where \ V\ = n. 
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Appendix: Running Examples 

We have tested our algorithm on a number of virtual endoscopy colon cases. Each 
helical CT volume consists of up to 500 images with 512 x 512 pixels per image. 
We present in this section a running example of our algorithm on a complex 
colon case. 

Figure 1(a) shows a colon that has been segmented from a CT volume and 
rendered for visualization. Notice that the segmentation result is inexact. A large 
portion of small bowel has been segmented in addition to the colon. We use this 
example to demonstrate the robustness of our algorithms. 

The skeleton resulting from the 3D thinning algorithm is shown in Fig- 
ure 1(b), superimposed on the original colon rendering. Notice that multiple 
colon segments touch each other, and that a portion of the small bowel touches 
the transverse colon. These touching segments result in segmentation artifacts 
known as holes and causes many extra branches in the skeleton. 

Figure 1(c) shows the true central path extracted from the initial skeleton 
(dotted line) and its B-spline approximation (solid line). 

Figure 2 shows two internal renderings of the colon lumen from two positions 
on the central path. Note that these are views of the colon surface from inside. 
The small holes that we see in these views are not the holes in the segmented 
colon object. Rather, they are artificial tunnels connecting colon segments that 
create holes visible from external views. These tunnels are where the extra skeletal 
branches pass through in order to go around the holes in the object. 

The time required to execute our algorithms varies with the type of platform, 
the size of input volume, and the complexity of the colon itself. On an SGI System 
with RIOOOO (Silicon Graphics, Inc., Mountain View, CA), the time required to 
convert a skeleton to a graph and to search for the true central path never 
exceeded 15 seconds. 
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Fig. 1. (a) Rendering of the segmented colon. Touching segments of colon and 
inclusion of a portion of small bowel are due to limitations in the segmentation 
process, (b) The initial skeleton preserves the original colon topology. The many 
extra branches that deviate from the center of the colon are caused by holes in 
the original object, (c) The central path and its smooth B-spline approximation. 
The dotted line represents the central path from the original skeleton and the 
solid line is its B-spline approximation. 




Fig. 2. Two internal renderings of the colon lumen as seen from positions along 
the central path. The central path is projected into the internal views for illus- 
tration. 
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Abstract. Constructing minimum ultrametric trees from distance ma- 
trices is an important problem in computational biology. In this paper, 
we examine its computational complexity and approximability. When 
the distances satisfy the triangle inequalities, we show that the mini- 
mum ultrametric tree problem can be approximated in polynomial time 
with error ratio 1.5(1 + [logn]), where n is the number of species. We 
also developed an efficient branch and bound algorithm for construct- 
ing the minimum ultrametric tree for both metric and nonmetric inputs. 

The experimental results show that it can find an optimal solution for 
25 species within reasonable time, while, to the best of our knowledge, 
there is no report of algorithms solving the problem even for 12 species. 
Keywords: computational biology, ultrametric trees, approximation al- 
gorithms, branch and bound. 

1 Introduction 

Constructing evolutionary trees from distances is an important problem in biol- 
ogy and in taxonomy and there are many different models to define the problems 
[11,14]. Most of the optimization problems of evolutionary tree construction have 
been shown to be NP-hard [3,4,5,6,8,9,13]. An important model is to assume the 
rate of evolution is constant. With this assumption, the evolutionary tree will 
be an ultrametric tree [11,14]. An ultrametric tree is a rooted, leaf labeled, and 
edge weighted binary tree in which every internal node has the same path length 
to all the leaves in its subtree. We only need to consider binary trees since a gen- 
eral ultrametric tree can be converted to a binary tree by replacing any vertex 
of degree larger than 3 (except the root) with a number of vertices of degree 3 
connected by edges of weight 0 [12]. Some results about ultrametric trees had 
been studied in [1,2,6,13]. Because of the high computational complexity, bi- 
ologists usually construct the trees by heuristic algorithms. For example, the 
UP CM A (Unweighted Pair Group Method with Arithmetic mean, see [14] for 
introduction) is a popular heuristic algorithm to construct ultrametric trees. 

In this paper, we examine the complexity and the approximability of the min- 
imum ultrametric tree construction problem, and develop an efficient branch and 
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bound algorithm which finds an optimal solution for moderate data instance in 
reasonable time. With the rapid development of the technique of DNA sequenc- 
ing and alignment, the dissimilarities between species are often obtained by 
DNA (or RNA, protein) sequence alignment. Depending on the scoring scheme 
of alignment, the distance matrix may or may not obey the triangle inequality, 
that is, it may be either a metric or not. In [6], the problem of constructing 
a minimum ultrametric tree from a nonmetric distance matrix was shown to 
be NP -complete and cannot be approximated within in polynomial time for 
some £ > 0 if NP^P. In this paper, we show that the problem remains to be 
NP-hard if the distance matrix is a metric, but can be approximated within ratio 
1.5(1 + [logn]). 

In the view of practice, the number of species is often not so large in many 
cases. So, it seems possible to compute an optimal tree by exhaustive search, 
that is, by checking all the possible trees. However, for n species, the number of 
unrooted, leaf labeled binary trees is 1 x 3 x 5 x ..(2n — 5) [12]. In the rooted case, 
for any unrooted tree, we can locate the root at any edge of the tree. Therefore, 
the number of rooted, leaf labeled binary trees is A[n) = Ix3x5x..(2n — 3).The 
function A grows very rapidly. For example, A(10) > 3 x 10^, A(15) > 2 x 10^^, 
and A(20) > 8 x 10^^. Apparently, it becomes impossible to exhaustively search 
a minimal tree even when n is moderate. 

The ” Branch and Bound” is a strategy to avoid exhaustive search. Theoreti- 
cally, a branch and bound algorithm cannot ensure a polynomial time complexity 
in the worst case. But it has been successfully used to solve some NP-hard prob- 
lems. In addition, a branch and bound algorithm can often find the near optimal 
solutions as well as an optimal one. This may be an important feature in the 
evolutionary tree problem since the correct tree is not necessarily a minimal one. 
In [12], a branch and bound algorithm was designed to construct a minimum 
evolutionary tree of 11 species, and there is no report of algorithms for con- 
structing minimum evolutionary trees of more species. In this paper, we present 
a branch and bound algorithm for constructing a minimum ultrametric tree from 
a metric or nonmetric distance matrix. The experimental results show that the 
algorithm can solve the problem in reasonable time for n < 25 if the input is a 
metric, or n < 19 for the nonmetric input case. 

The rest of the paper is organized as follows: In Section 2, we give some 
preliminaries. We then show the complexity and the approximation algorithm 
in Section 3. The branch and bound algorithm will be presented in Section 4. 
Section 5 presents the experimental results. Finally, concluding remarks are given 
in Section 6. 

2 Preliminaries 

We first give some definitions as follows: 

Definition 1. A metric M is an ultrametric iff M[i,j] < m8ix{M[i^k]^ M[j^k]} 

[ 1 ]. 
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Definition 2. Let T = (U, be an edge weighted tree and G U. The 

path length from i to j is denoted by The weight of denoted by 

w[T)^ is defined as w;(e). 

Definition 3. Let T he a rooted tree and r be any node of 'it i^ denotes the 
subtree rooted at r, and L[T) denotes the leaf set of it 

Definition 4. An ultrametric tree T of is a rooted, edge weighted binary 

tree with L[T) ={l..n} , and for each node v of i', i(i’,i,u) = G 

L(i^). So, we can define height{y) = d[l\v^i) for any i G iv(i^). 

It should be noted that an n x n metric is ultrametric if and only if there is 
an ultrametric tree T of such that d(i’, i,j) = Ai[i,j] Vi,j [1]. 

Definition 5. Minimum Ultrametric Tree from General distances (MUTG): 
Given an n x n distance matrix M (not necessary a metric), find an ultrametric 
tree T such that L{T) ={l..n} , i(i’, i,j) > M[i^j] ^i^j^ and w[T) is minimum. 
The Minimum Ultrametric Tree from Metric distances (MUTM) problem has 
the same definition except that M is a metric. 

In the following paragraphs, unless specifically indicated, a tree is a rooted, 
nonnegative edge weighted binary tree. Given a tree T = (U, the un- 

weighted tree F = {V^F) is called the topology of T. A tree with topology F 
and weight w is denoted as F{w), 

3 The complexity and approximability 

The MUTG problem was defined and was shown NP-hard in [6]. They also 
proved that it is hard to approximated. Precisely, it had been shown that there 
is an £ > 0 such that the MUTG problem cannot be approximated in polynomial 
time within ratio unless NP=P. However, the complexity of MUTM has not 
been found. In this section, we shall show MUTM is NP-hard, but has different 
approximability. We now show some properties which will be used in the proofs 
and in the branch and bound algorithm. 

Definition 6. Min Ultrametric Tree with a given Topology (MUTT) problem: 
Given any distance matrix M, a topology F with L{F) ={l..n} , find a non- 
negative edge weight function tc, such that T = F{w) is an ultrametric tree, 
d(fl\i^j) < and w{fT) is minimum. 

Definition 7. Let F and M be the tree topology and distance matrix of an 
MUTT problem, and s be any node of F, h{s) = max{M [i, j]|i, j G L{F_^)}/2, 

Lemma 1. Let T = F{w) be the solution of an MUTT problem with input M 
and F. For any internal node s of P, heightfs) = h[s). 
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Proof. It is easy to see that height[s) > h[s)^ otherwise there exist two leaves 
G Ts such that d(fl\i^j) = 2height[s) < M[i^j]. Assume there is an internal 
node s such that heightfs) = h[s) + ^ > h[s)^ and height{y) = h{y)\/v G Tg. Let 
X and y be the sons of s, then w[s^x) > 6 and w[s^y) > ^. If s is the root, we 
can set W 2 {s^x) = wfs^x) — 6 and W 2 {s^y) = wfs^y) — 6^ and result in a feasible 
solution with less tree size. If s is not the root and z is the father of s, we can 
set tC 2 (s, x) = x) — 6^ W 2 {s^ y) = y) — 6^ and tC 2 (z, s) = tc(z, s) + and 
also result in a feasible solution with less tree size, which again contradicts to 
the assumption that w[T) is minimal. □ 

From Lemma 1 and the property of ultrametric tree, we have the following 
corollaries: 

Corollary 2. Let T = P{w) be the solution of an MUTT problem with input 
M and P, r be any internal node of P, and v he the two sons of r. 
vj{Tr) = vj{Tu) + vj{Ty) + 2h{r) - h{u) - h{v). 

Corollary 3. Let T = P{w) be the solution of an MUTT problem with input 
M and P, w{T) = height{r) + ^seT height{s), where r is the root of T. 

Based on Lemma 1 and Corollary 2, it is not hard to compute the edge weight 
of the MUTT problem using a postorder traversal of the tree. We list the result 
but omit the algorithm in this abstract. 

Theorem 4. Given the distance matrix and the topology of the ultrametric 
tree, the MUTT problem can be solved in linear time. 

We now show that the MUTM problem is NP-hard. The proof is similar as 
the one for MUTG in [6]. Given a graph G = (U, P), we construct a matrix M 
in which = 4 if (i,j) G P, and = 2 otherwise. Obviously, M is a 

metric. We can show that G is /^-colorable if and only if there is an ultrametric 
tree T of M where w[T) = k. Since the Graph /^-colorability problem is NP- 
completeflO], we conclude that the MUTM problem is NP-hard. 

Theorem 5. MUTM is NP-hard. 

Unlike MUTG was shown to be hardly approximated, we shall show MUTM 
can be approximated within ratio 1.5(1 + [logn]) by giving an approximation 
algorithm. 

Definition 8. Let M be an n x n distance matrix and G = (U, P,tc) be the 
corresponding complete graph of M, in which V ={l..n} , 

TSP[M) denotes the length of a minimal Hamiltonian cycle on C, which is the 
solution of Travelling Salesman Problem on G. 

Lemma 6. Let P be a minimum ultrametric tree of M. w{T) > TSP[M)/2. 
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Proof. Let C be an Euler tour on T . Then, the length of the tour C, w{C) = 
2w{fP). Without loss of generality, assume C visit the leaves in the order l,2,3,..,n. 
We have 

n n 

w{C) = — l,i) + d(fl\n, 1) > — l,i] + M[n, 1] 

i=2 i=2 

Since TSP[M) is the minimum length among all Hamiltonian cycles, we have 
w{C) > TSP{M). It follows w{T) > TSP{M)/2. □ 

Definition 9. Let i < j. CBTT{i^j) is a binary tree topology with leaf set 
{i,i T 1, j}. If i = j, it contains only one vertex i. Otherwise, if r is the root 
and are the two sons of r, then, the two subtrees rooted at x and y are 
= CBTT{i,[{ipj -l)/2\) and 7, = CBTT{\{i P j - 1) /2\ P IJ). We 
also define that the level of root is 0, and a node has level i T 1 if its father has 
level i. Obviously, the leaves of CBTT{l^n) have level |~logn] or [logn] — 1. 

Our algorithm is based on the 1.5-approximation algorithm of the Travelling 
Salesman Problem with metric input [10]. The approximation algorithm is listed 
as follows: 



Algorithm Approx JVIUTM 
Input: a metric M. 

Output: an ultrametric tree T with w[T) < 1.5(1 + [log n])OPT, 

where OPT is the weight of minimum ultrametric tree of M. 
Stepl: Run 1.5-approximation algorithm of TSP problem. 

Relabel the leaves such that the solution is (l,2,..,n), that is, 
Y.7=2^[^-^b]T M[n,l] < 1.5TSP{M). 

Step 2: Construct a topology P = CRTT(l,n). 

Step 3 : Find the minimal edge weight function w by solving the MUTT 
problem, and output T = P[w) SiS on approximation solution. 

Lemma 7 . Let T be the ultrametric tree constructed by algorithm 
Approx_MUTM with input M. w{T) < 0.75 ([logn] + 1)TSP{M). 

Proof. First, for any internal node s which is the root of CBTT{a^ 6), 

b 

= max{M [i, j]|a < i < j < 6}/2 < M[i — l,i]/2 

by the triangle inequality. Let siyS 2 j..ySk be the nodes of level i, and Sj is the 
root of CBTT[aj^bj). Obviously, {ai..6i}, is a partition of 

{l..n}. So, 

k n 

'^^^height{sj) < [i — l,i]/2 < 0.75TSP{M) 

j=l i=2 
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Since the tree has ([logn] + 1) levels and the nodes at the last level has 
height 0, 

height[s) < 0.75 [log n] TS'P(M ) 

seT 

Let r be the root of T . From Corollary 3, 

w((r) = heigkt[r) + height[s) < 0.75 ([logn] + 1) TSP[M) 

seT 



□ 



Theorem 8 directly comes from Lemma 7 and 6. 

Theorem 8. MUTM problem can be approximated in polynomial time within 
ratio 1.5(1 + [logn]), where n is the number of species. 

4 A branch and bound algorithm 

Before presenting the branch and bound algorithm, we first show some useful 
properties: 

4.1 Some properties 

Definition 10. Let M be a matrix. max(M) denotes maxy^jM [i, j]}. 

Definition 11. Let be a topology, and a,6 G L(/^). LCA[a^b) denotes the 

lowest common ancestor of a and 6. Let x^y be two nodes of P, we define x ^ y 
if and only if x is a ancestor of y. 

The following lemma can be proved from Lemma 1, and we omit the proof in 
this abstract. 

Lemma 9. If M[u^v] = max(M), there exists a minimum ultrametric tree T 
such that u and v are in the two subtrees of root r. 

Definition 12. Let P be a topology. A relation P(P) is defined to be 
{(a, 6, c)|a, 6, c G L(P), LCA{a^ c) = LCA(h^ c) LCA{a^ 6)}. 

Definition 13. Let P, Q be topologies. P C Q if and only if P(P) C P(Q). 

It should be noted that P C Q means that for all leaves of P, the topological 
relation are the same in P and Q. It also means that we can obtain Q by inserting 
the leaves in L[Q) — L(P) into P. The insertion operation is defined below: 

Definition 14. Let P = (C, E) be a topology, e = (a, b) G E. Q — Insert[P^ e,x) 
is a topology obtained by inserting a new leaf x into P at e. The insertion is to 
replace e with two edges (a, s) and (s, 6), and insert an edge (s,x), in which s is 
a new internal node. 
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It is easy to see that F C Q If Q = Insert[F^e^x). For any topology F^ F 
can be obtained by a sequence of insertions with any specified sequence of L[F). 
For developing the branch and bound algorithm, we find a lower bound function 
which holds for both metric and nonmetric input. We list the result but omit 
the proof in this abstract. 

Lemma 10. Let T[i) be a minimum ultrametric tree of M with leaf set 
and topology F^ and T[n) be a minimum ultrametric tree of M with 
leaf set {l..n} and contains T{i). Then 
w{T{n)) > w{T{i)) + < j}/‘2. 

Lemma 10 is true for any permutation of species. To make the algorithm 
more efficient, we hope the lower bound is as large as possible, and use a special 
permutation of species. The permutation is defined as follows, and the efficiency 
will be shown in Section 5. 

Definition 15. Let M be an n by n distance matrix. A permutation (ai, U2, • * e 
a^) of {l..n} is called as a maxmin permutation if M[ai,a 2 ] = max{M) and 
inmk<:i{M[ai,ak]} > inmk<:i{M[aj,ak]} VI < i < j. 

Given any distance matrix M, it is not hard to find a maxmin permutation 
in linear time. We omit the algorithm in this abstract. 



4.2 The algorithm 

The branch and bound algorithm is a tree search algorithm. The algorithm 
repeat searching the Branch and Bound Tree (BBT) for better solutions until 
an optimal solution is found. Instead of listing the algorithm, we describe the 
key points of the algorithm in the following: 

— The algorithm first finds a maxmin permutation. After relabelling, assume 
{l..n} is a maxmin permutation. 

— The root (level 0) of BBT represents the minimum ultrametric tree of 1,2. 
It is a tree with root r, and leaves 1 and 2 are the sons of r, and tc(l,r) = 
tc(2,r) = M[l,2]/2. According to Lemma 9, there is an optimal solution 
contains this topology. A son of a node with level i will be said to have level 
i T 1- 

— A node v at level i represents the topology of an ultrametric tree T of 
{l..i T 2}. C(i;) is the minimum tree size for this topology. 

— The branching rule: Any topology, which represented by a node at level i, 
has i T 2 leaves and 2 2i edges. Therefore, there are 2i + 2 ways to insert 
leaf i + 3 into the topology, and each results in a different topology. A node 
at level i has 2i 2 sons. 

— The upper bound: Initially, the algorithm uses a heuristic algorithm UP- 
GMM (Unweighted Pair Group Method with Maximum) to find a feasible 
solution. While the algorithm running, it retains the best solution so far, and 
uses it as an upper bound. UPGMM is modified from the popular heuristic 
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algorithm UPGMA. The heuristic is described as follows: Initially, there are 
n ultrametric trees, and each contains only one species. It finds two trees 
A^B such that max{M[x,y]|x G L{A)^y G T(i^)} is minimum, and merges 
A and B into one ultrametric tree by creating a common root, and then 
constructs the tree recursively on the n — 1 trees. 

— The lower bound: According to Lemma 10, for a node v at level i, we can 

set the lower bound by LB{v) = C{v) + mm{M[kJ]\\/k < j}/2. 

— When the upper bound updated, that is, a better solution found, all nodes 
will be checked. Any node whose lower bound is no less than the upper bound 
can be deleted (bounding) safely since any tree containing this topology is 
of tree size no less than the upper bound. 

— Search strategy: Two strategies are often used: Depth-First and Best- 
First. It is believed that the Depth-First search uses less space while the 
Best-First search often achieves better time efficiency. In our algorithm, we 
use both strategies to get a balance between time and space. 

— Data structure for a node of BBT: Instead of storing the ultrametric tree 
in a node, we only store the attached edge where the new leaf inserted. 
Furthermore, the edge can be represented by a code sequence of 0,1. Starting 
from the root and setting the code empty, if the edge is in the left (right) 
subtree, we append a code 0 (1) and go down to the left (right) son. Continue 
this process and we can get the 0-1 code for each edge. Under this data 
structure, we need to reconstruct the tree every time we select a node to 
branch. For the sake of efficiency, we modify the Best-First search strategy. 
When a node is selected, our Best-First search only select the local best 
node among its children. Until the last level reached or all the branching of 
a node bounded, it selects the global best node again. It should be noted 
that the tree topology in a BBT node is uniquely determined by the codes 
of it and all its ancestors. 

5 The experimental results 

5.1 The environment and data instances 

We have implemented the above methods in C language on PCs (Pentium-90) 
running MS- DOS. The data instances are generated randomly. Since the dis- 
tances are even integers between 2 and 100, there is no rounding error for the 
tree size. Both metric and nonmetric data were tested. 



5.2 Results of running time 

For metric input, we ran experiments for the number of species ranging from 12 
to 25. The results are shown in Table 1. In the table, n is the number of species, 
and the second row indicates how many data instances we perfromed for each 
n. For each n, the worst and the average running time are listed below. The last 
row presents the median of the time of the instances. For example, if we ran the 
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program with 5 (different) data instances for 10 species, and the running time 
are 1, 1, 2, 6, 10 respectively. The worst case is 10 (sec.), the average is 4, and 
the median is 2. For nonmetric input, we performed experiments for the number 
of species ranging between 9 to 19, and the results are shown in Table 2. 



Table 1: Running time(seconds) for metric input 


n 


12 


13 


14 


15 


16 


17 


18 


19 


20 


24 


25 


^ of instances 


100 


100 


100 


50 


50 


20 


20 


20 


20 


5 


5 


worst case 


8 


15 


33 


63 


333 


2730 


6047 


6786 


1374 


53126 


23273 


average 


1 


2 


3 


6 


16 


210 


409 


702 


317 


13724 


8644 


median 


1 


1 


1 


2 


3 


12 


33 


195 


101 


3492 


5584 



Table 2: Running time (seconds) for nonmetric input 


n 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


^ of instances 


100 


100 


100 


100 


100 


100 


50 


50 


10 


5 


5 


worst case 


2 


6 


8 


134 


368 


1617 


2433 


12420 


26154 


44627 


29895 


average 


1 


1 


2 


6 


33 


114 


293 


1288 


8933 


13852 


18862 


median 


1 


1 


1 


2 


6 


62 


165 


586 


3160 


5206 


24469 



5.3 The benefit of maxmin permutation 

To show the benefit of maxmin permutation, we compare the running time of 
algorithms with and without using maxmin permutation. The comparison is 
shown in Table 3, and the experiments are for metric input. Without maxmin 
permutation, it takes more than one day for n=21 while it completes within one 
day for n=25 with maxmin permutation. We have also performed some tests for 
nonmetric cases (not listed in the table). Without maxmin permutation, it failed 
to complete a test for n=16 within one day. It took no more than one day for 
n=19 if the maxmin permutation was used. 



Table 3: The efficiency of 


maxmin permutation (time in 


secs) 


n 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


with maxmin 


2 


8 


15 


33 


63 


333 


2730 


6043 


6786 


1374 


without maxmin 


2 


13 


61 


898 


624 


1356 


43001 


31705 


40742 


41367 



5.4 The performance of UPGMM 

We also recorded the ratio that UPGMM can find the optimal solution. The 
results are shown in Table 4. For metric cases, UPGMM finds the optimal so- 
lution with high probability only when n < 10, and for nonmetric cases, it can 
hardly find the optimal even when n=8. But in another point of view, we found 
that the error ratio between optimal and the tree size found by UPGMM is very 
small. In our tests, it never exceeds 5%. So, it is a very good upper bound for 
the branch and bound algorithm, and it indeed makes our algorithm very time 
efficient. 
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Table 4: The ratio of the data which optimum found by UPGMM 


n 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


24 


25 


metric input 


.84 


.78 


.68 


.62 


.61 


.44 


.45 


.40 


.32 


.45 


.15 


.35 


.20 


0 


.20 


general input 


.40 


.25 


.24 


.13 


.11 


.09 


.10 


.02 


.04 


0 


0 


0 


- 


- 


- 



6 Conclusion remarks 

The branch and bound algorithm has another two important features: (1) It can 
be easily modified to generate all near optimal solutions. (2) It can be easily 
parallelized. We close this paper by mentioning a few open problems. Theoreti- 
cally, the approximability of the problem with metric input is still unknown. In 
this paper, we only show it is not as hard as the nonmetric problem. Another 
important future work is to examine the complexity and approximability of the 
additive tree construction problem, and to develop a similar branch and bound 
algorithm for it. 
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Abstract. A dominating set D of an undirected graph O is a set of 
vertices such that every vertex not in D is adjacent to at least one vertex 
in D. Given a undirected graph O, the minimal cardinality dominating 
set problem is to find a dominating set of G with minimum number of 
vertices. The minimal cardinality dominating set problem is NP-hard for 
general graphs. For permutation graphs, the best-known algorithm ran 
in O(nloglogn) time, where a is the number of vertices. In this paper, 
we present an optimal 0(n) algorithm. 

1 Introduction 

Let 7T = [tti, 7T2, . . . , 7T„] be a permutation of the numbers 1, 2, . . . , n. A permu- 
tation graph Cp] = (V E) with respect to tt is defined as follows [3]: 

V = {1,2,. ..,nj 

and 

(i,j) e E iff (i - j){TT~^ - 7T“^) < 0, 

where is the position of k in tt, A graph (T is a permutation graph if and 
only if G is isomorphic to some G[tv\. Fig. 1 shows the permutation graph of a 
given permutation tt = [3, 1, 5,7, 4, 2, 6]. In this paper, we assume that the input 
is a permutation tt = [tti , 7T2 , . . . , tt^] . 

For an undirected graph G = (U, F'), a vertex i G U is said to dominate an- 
other vertex j G U if (u j) G F. For any two sets Si and F 2 , the set S 2 \S\ is the 
set of all elements which belong to S 2 but not belong to S\. For an undirected 
graph G = (U, F), a vertex set S\ is said to dominate another vertex set S 2 if 
every vertex in S 2 \S\ is dominated by at least one vertex in Si. Let Si > S 2 and 
Si S 2 denote that Si dominates S 2 and Si does not dominate S 2 respectively. 
A vertex set S is said to be a dominating set for G if S \> V. The minimum 
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TiC 2 6 1 5 3 7 4 

1 2 3 4 5 6 7 




7C= 3 1 5 7 4 2 6 

(a) 




Fig. 1. A permutation graph G[tv\ for 7 t= [3, 1,5, 7, 4, 2, 6 ] 



cardinality dominating set (MODS) problem and the minimum weighted dom- 
inating set (MWDS) problem are to find a dominating set S for G where the 
number and the total weight of vertices of S are minimized respectively. 

Both of the MODS and the MWDS problems are AP-hard for general graphs 
[ 2 ]. On permutation graphs, Farber and Keil [ 1 ] proposed O(n^) and O(n^) 
algorithms for the MODS and the MWDS problems respectively based upon 
the dynamic programming method. Later, by utilizing the monotone ordering 
among the intermediate terms of the recursive formula in [1], Tsai and Hsu [ 6 ] 
improved the time-complexities to 0 (n log log n) and O(n^log^n) respectively. 
For the MWDS problem, Liang el al. [4] proposed an 0(n(n T m)) algorithm, 
where rn is the number of edges. Later, they reduced the time-complexity to 
0(n T rn) time [5]. In this paper, we propose an optimal 0(n) algorithm for 
solving the MODS problem on permutation graphs. Our algorithm is based on 
a new recursive formula by using the dynamic programming method, which is 
different from the formula in [ 1 ]. There is also a monotone ordering among the 
intermediate terms of our recursive formula. Then, we propose the new updating 
rules so that we can design an optimal linear time algorithm. 

2 The Dynamic Programming Approach 

Consider a permutation graph G[tv] defined by a permutation 
7 T = [ 7 ri, 7 T 2 , . . . ,7T^]. Throughout this paper, we assume that the permutation 
7 T is given. Following the notations defined in [1,6], define M = {m, 7 T 2 , . . . , 7 t^} 
and Vi^j = n {1,2, . . . ,j}. For each i, 1 < i < n, we define tt* to be the 
minimum number over the suffix tt^, . . . , tt^. Define U {tt*}. For the 

example in Fig. 1, F 4 = (3, 1,5,7}, ¥ 4^3 = {1,3} and = {3, 1,5, 7, 2}. 

For any vertex set S', define max(S) to be the maximum number in S. For 
each i and j, 1 < i, j < n, define Dij as follows: 

1 . Dij is a minimum cardinality subset of F^* dominating Vij. 

2 . msiX^Dij) is as large as possible. 

Obviously, is a desired minimum cardinality dominating set for G. 

Let A be a set of subsets of V. Let S be a non-empty element in X such 
that S is with the minimum cardinality among all elements in X and max(S) 
is is as large as possible, if A 7 ^ 0 and X ^ {0}. Then, define set_min(A) 
as follows: set_min(A) = ^ifA = (j) or X = {^}, and set_min(A) = S' if 
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otherwise. set_min(A) may not be unique. If there are more than one candidate 
for set_min(A), select any one to be set_min(A). It is easy to prove that if A, 
Y and Z are sets of subsets of V and X = F U Z, we have set_min(A) = 
set _min ( { set _min ( F ) , set _min ( A ) } ) . 

According to the definitions of and set_min, we have the following: 

I). . = j ^ if Vij = (f>, 

y set_min({S' | S' C C* and S > C ^}) if otherwise. 

Consider the case where i = 1. We have the following rule: 

^ if i 

y {"^i} if otherwise. 

Let Dtt* = U {tt*}, U {k-i} and D max A-ij u 

{max(y^)}. 

Lemma 1 For each i\, %2 and j; 1 < ii < ^2 < ^ ctnd 1 < j < n, Vi^ j C Vi^ j 
andVC^dVC^. 

Proof. The proof is straightforward and omitted. □ 

Lemma 2 For each % and j, 1 < i < n and 1 < j <n, 

{D^^,D^^,D^ax}c{S\ScV; andS> F,,,}. 

Proof. C Vf-i C Vf by Lemma 1. Hence (1) Di-i^T^*U{7vf} C Since 

{tt*} > Vij \ and Hence (2) Vij is dominated by 

U {tt*}. According to (1) and (2), we have G {S\S C Yf and S > 
Vij}^ The proofs for and Dmax are similar and omitted. □ 

Lemma 3 For each i and j , \ < i <n and ivf < j < n, ifY C {*S|7t* e S^S C 

and S > Vij}, then seFminfY U = D^^*. 

Proof. Let X denote {*S|7t* £ S^S C Vf and S > Vij}. Suppose F C A and F is 
not empty. Consider any A £ Y. Since Vij D F^-i^tt* and A > A > 

Since no vertex in is dominated by or tt*, A\ tt* } > F^_i . Since 

AcV* = U {TTi.Trf}, A \ <} C F^-i C Therefore, A \ <} C 

{A I A C Vf_-^ and S > Note that tt* g A. Then, \Di-i^T^*\ < |A| — 1 

and |D^-i,7t*| = |A| — 1 only if tt* = or ^ A. Hence, |D7 t*| < \A\ and if 

= |H.|, max(/27^*) > max(A). Therefore, set_min(F U {Dt^*}) = □ 

The proofs of the following three lemmas are similar to Lemma 3 and omitted. 
Lemma 4 For each i and j , I < i <n and ivi < j < n, if Y C {N|7t^ G A, tt* ^ 
S,S C F-* and S > Vij}, then sePmin{Y U = D^^.. □ 

Lemma 5 For each i and j, 1 < i < n and ivi < j < n, if msix{Di-ij) < iVi 
andY C {S\7Ti ^ N, tt* ^ S,S cV^ and S > Yj}, then sePminfY U{Dmax}) = 
Dmax • C I 
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Lemma 6 For each i and j , I < i <n and I < j < n, if Y C {S\7Vi ^ S,7vf ^ 
S,S C and S > Vij}, setjmin[Y U { □ 

Theorem 1 The following recursive formula correctly computes Dij, where 1 < 
i <n and I < j < n. 

]j. . - f set.min{{D^* , D^ax}) if j > TVi and max(A-i,j) < 
set-min{{Di-ij^ }) if otherwise 

Proof Let X = {S \ S C Vf and S' > Vij}, Xi = {S|7t* g S, S C and S > 
X 2 = {S\tv, e S, < ^ S, S C C,* and S > V,j} and X^ = {S\^,^ S, < ^ 
S, S C Vf and S > It is obvious that X = XiU X 2 U Xs. According to 

Lemma 2, {Dn*: h)max} C X. Hence, 

set_min(A) = set_min(A U h)max}) 

= set_min({set_min(Ai U 
set_min(A 2 U {12^,}), set_min(A 3 U {D^ax})})^ 

Furthermore, if C A, since \Di-ij \ < \Dm,ax\: then 

set_min(A) = set_min(A U Dmax, 

= set_min({set_min(Ai U 
set_min(A 2 U set_min(A 3 U {Di-ij})}), 

Case 1 . j > 7Vi and max(/2^_i^j) < 

According to Lemmas 3, 4 and 5, we have 

set_min(A) = set_min({D^*, 

Case 2. j > and max(D^_i^j) > 

It can be easily shown that C X. By Lemmas 3, 4 and 6, we have 

set_min(A) = set-mindD^^* , Di-ij}), 

Case 3. < j < Try 

It can be easily shown that Fi-ij G X . Furthermore, since j < tt^, no vertex 
in Vij is dominated by tt^. Hence tt^ must not belong to Therefore, A 2 
and need not be considered for Di^j. Then, by Lemma 3 and 6, 

set_min(A) = set_min({set_min(Ai U set_min(A 3 U {Di-ij})}) 

= set_min({/2^j, A-ij}) 

= set_min({i4^*, A-iy})- 



Case 4. j < tt*: 

It can be easily shown that Di-ij G X. Furthermore, since j < tv* < tt^. 
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neither tt* nor belongs to We need not consider Ai, A2, and 

for finding Lemma 6, we have 

set_min(A) = set_min(A3 U {Di-ij}) = Di-ij, 

Since both and belong to X , the following equality holds 

set_min(A) = = set_min({A^7r*, 

Note that all cases are considered. Hence, the proof is complete. □ 



3 The New Updating Rules. 



For each i and j, define dij = \Dij\ and niij = max(A^^^j). 

Lemma 7 (The Monotone Lemma) For each i, if ji < j 2 , then < dij^ 
Furthermore, if ji < j'2 ctnd dij^ = dij^, then rnij^ > raij^. 



Proof. Suppose ji < j' 2 . We have C Note that 



Dij^ C Vf". Hence, Dij^ c {S\S C Vf and S > By the definition of 

Di^j -^ , the proof is complete. □ 



Lemma 8 For each i and j , I < i < n and I < j < n, di-±j < + 1. 

Furthermore, if di-ij = di-i^^* + then rai-ij > . 



Proof. Consider the set Vi-ij \ H \ is an empty set, we 

have Hence, di-±j < di-i^^j,*. Suppose Vi-ij \ is not 

empty. For any vertex v G \ , we have v is dominated by Hence, 

G {S\S C Vili and S > Therefore, di-i^j < di-i^^*pl. 

□ 



Furthermore, if di_ij = + 1, nii_i j > m^_i 7^*. 

Define ni* = max(D^_i^7r* C {tt*}). Obviously, m* = max({m^_i^7, 
Our new updating rules are listed in Table 1. 






Theorem 2 The updating rules listed in Table 1 correctly compute dij and niij 
for 1 < i <n and 1 < j < n. 

Proof. By Lemma 8, di-±j < + 1. Specially, T 1. Since 

P di-i^^. > di-i^^* by Lemma 7 . Hence, there are only two cases which 
should be considered: (1). di-i^^^. = di-i^^j,* and (2). di-i^^^. = di-i^^j,* T 1. 

Case (l). i,7Ti — i,7T* ' 

Since d^_i 7^. = d^_i and tv* < tt^, we have m^_i > m^_i 7^. . 

(la). di—\j — T L 

Since di-ij > di-i^^^., we have j > tt^. If nii-ij < iVi, by Theorem 1, 
Dij = set_min({A>^*, D^,, D^aa;}). Since + 1, we have 

TVi > nii-ij > > nii-i^T^., by Lemma 8. Hence, max(D 7 rJ > 

max(D 7 T*). Note that \ = IDttJ = + 1 and |Dmacc| = di-ijP 

1 = di-i^Tj,* + 2. Therefore, Dij — D^^.. Otherwise, i.e. if nii-ij > iVi, 
by Theorem 1, Di j = set_min({D^_i A^TTi})- Note that \Di_i j \ = 

\ = IDttJ- Furthermore, we have nii-ij > TVi > tv* and > 

'^i-i.TTi and by Lemma 8 , nii-ij > Hence, Dij = Di-ij. 
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Case (1). 

(la) If di-ij = + 1, then 

— i,7T* l^^i) if <C 



{dij,m.ij) = 



{di-i,j^mi-ij) if otherwise. 



(lb) If di-ij — di-i^j^f^ then 

, . _ J (<ii_i, 7 r* + 1, max(Vi)) if nn-ij < iVi and j > tt^, 

(, ^ if otherwise. 

(l c) If di-ij < di-i^j,*, then 

{dij^rriij) = [di-ij ^rrii-ij). 

Case (2). di-i^^. = di_i^^* + 1: 

(2a) If di-ij = di_i^j^f + 1, then 

{ (di-i, 7 T* + 1, rn'^) if {nii-ij < iVi and j > iVi) 
or {nii-ij <7Vi), 
{di-ij^rrii-ij) if otherwise. 

(2b) If di-ij < di_i^j^f^ then 

1 O ) • 



Table 1. The New Updating Rules. 



(lb). di—\^j — 

If j > TVi and rrii-i^j < 7Vi, Dij = set_min({/2^*, 12^,, by Theo- 
rem 1. Note that \D^*\ = \D^.\ = |T^rna®| = + 1- Hence, Di j = 

^)max- Otherwise, by Theorem 1, Di^ = set_min({/2^_i j, }). 

Note that \Di-i^j \ = \ = |T^ 7 tJ- Hence, Di^j = Di-i^j. 

(ic). di—\^j <C 7 J-* . 

We have j < iVi. By Theorem 1, Dij = set_min({/2^_i j, }). 

Note that \Di-\^j \ < \Dt^* \ = |i^ 7 rj- Hence, Di^j = Di-ij, 

Case (2). di-i^TTi = + U In this case, we have |T^ 7 tJ = + 2. 

(2a). di—\^j — di—i^'^* T 1- 

If rni_ij < TVi and j > tt^, Aj = set_min({/2^j, 12^. , 12^^^}) by Theo- 
rem 1. Note that \ < |d^ 7 rj = |T^ma®|- Hence Dij = Otherwise, 
Dij = setjcmii{{Di-ij^ D^* ^ D^.}). Note that \Di-ij\ = \Dt^*\ < \Dt^.\, 
Hence, if nii-ij < m*, Dij = Dtt*. Otherwise, we have Dij = Di-ij, 
(2b). di—\j U di—i^'^*. 

Since di- 1 J < < iVi. Hence, 12^ ^ = set_min({/2^_i j, , I^t^.}). 

Since U |i47j-*| <C Cl 

For example, consider the permutation graph shown in Fig. 1. The updating 
of {dij^ mij)^s by using Theorem 2 is shown in Table 2. 

4 The Linear Time Algorithm 

We now use Theorem 2 and show that we can obtain an 0(n) time algorithm. 
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3 = 


1 2 


3 


4 


5 6 


7 


i = 1 


(0,0) 


(1.3) 


i = 2 


(1.3) 


i = 3 


(1.3) 


1 


1 (2,5) 


1 


i = A 


(1.3) 




(2,5) 


(2,3) 


i = 5 


(1.3) 




(2,7) 


(2,5) 


(2,4) 


i = 6 


(1.3) 




(2,7) 


(2,5) 


(2,4) 


i = 7 


(1.3) 




(2,7) 


(2,5) 


(3,7) 



Table 2. The values of [dij ^niijys for the permutation graph in Fig. 1. 



From Lemma 7, we know that if j± < j < and {dij ^ , ^ 

then {dij^^niij^) = [dij^niij) = Thus we do not have to com- 

pute all dij^s and my^’s. We only have to compute the range of j^s such that 
{dij^ raij) is the same in this range. 

Consider the case when i = 1. 

According to the discussion in Section 2, there are only two cases. 

(1) 1 ^ j < = 07 di j = 0 and mij = 0. 

(2) 7Ti < j < n, Dij = {tti}, dij = 1 and raij = tti. 

For each 1 < i < n, let us divide the range of j into blocks. In each block, 
the 2-tuple [dij^niij) is the same for that range of j’s. Each such largest block is 
called a dm-hlock. For instance, when i = 1, there are two dm-blocks, as in Fig. 2. 
For each dm-block, it is associated with a 4-tuple [j\^j 2 ^d^m] where variables 
ji and j 2 , ji < ] 2 ^ define the range of this dm-block and {di^j^ rn) for 

each j , ji < j < • We define an a- group to be all of the dm-blocks [ji , J 2 , d^ ni] ’s, 

where d — a. According to Lemma 7, all of the dm-blocks in an a-group occupy a 
consecutive range of j’s and the rightmost (resp. leftmost) dm-block [ji^j 2 ^d^ ni] 
is associated with a smallest (resp. largest) value of m. 



3 



= 1 2 • • • TTi — 1 TTi • • • a 



( 0 , 0 ) 






Fig. 2. The dm-blocks for i = 1. 



The following is our algorithm to solve the problem. We shall discuss the 
data structure needed to implement the algorithm later. 

Algorithm A. Finding a MODS on a Permutation Graph. 

Input: A permutation tt = [tti, 7T2, . . . , tt^]. 

Output: A minimum cardinality dominating set of the graph G[tv]. 

Step 1: Compute tt* and max(T0) for all i’s. 
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Step 2 : For i = 1, add [1,7Ti — 1,0,0] into 0-group and add [7Ti,n, 1,7Ti] 
into 1-group. 

Step 3: i ^ i -h 1. 

If (i > n), then goto Step 7. 

Find and 

m* ^ max({7T*,m^_p^*}). 

If then goto Step 4 else goto Step 6. 

Step 4: /* Case la */ 

For the + l)-group, find the rightmost dm-block B, 

b'l, ^ B. 

]2 ^ J2. 

While (m < 7T^) Do 
Mark this dm-block. 
j'l ^ h- 

Find the left dm-block of the current dm-block in this group, 
b'l, j2,rf, w] ^ B'. 

EndOfWhile 

Delete all marked dm-blocks in this group. 

Insert + l,7Ti] to the right end of (di-i, 7 r* + l)-group. 

Step 5: /* Case lb */ 

For the -group, find the rightmost dm-block B, 

bi, j2,d,m] ^ B. 

J2 ^ h- 

While (m < tt^) and [ji > tt^) do 
Mark this dm-block. 
j[ ^ ji- 

Find the left dm-block B' of the current dm-block in this group. 
[jij 2 ,d,m] ^ B'. 

EndOfWhile 

Delete all marked dm-blocks in this group. 

If (m < TVi) and (j 2 > tt^), then 
Delete this block. 

Insert dm-block — 1, d, m] to the right end of this group. 

^ Tlj, 

EndOflf 

Find the leftmost dm-block B^^ of + 1) -group. 

bl02,rf,TO] ^ B". 

If m = max(F^), then j '2 ^ J 2 and delete this dm-block B^^ . 

Insert dm-block [j(, j' 2 , + l,max(F^)] to the left end of 

+ l)-group. 

Goto Step 3. 

Step 6: /* Case 2a */ 

For the -group, find the rightmost dm-block B, 

bi02,d,m] ^ B. 
j '2 ^ h- 
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While ((m < tt^) and [ji > tt^)) or (m < tv*) 

Mark this dm-block. 

Ji ^ ji- 

Find the left dm-block of the current dm-block in this group. 

[jij2,d,m] ^ B'. 

EndOfWhile 

Delete all marked dm-blocks in this group. 

If (m < TVi) and (j '2 > tt^) then 
Delete this block. 

Insert dm-block m] to the right end of this group. 

j'l ^ VTi. 

EndOflf 

Insert dm-block j' 2 , + l,m*] to the right end of this group. 

Goto Step 3. 

Step 7: 

Find a MODS by backtracking. 

The End of Algorithm A. 

In the above algorithm, has to be found. This is one of the most 

critical steps. A straightforward way is to scan all of the dm-blocks. This will 
be time-consuming. We shall use the fact that 7 t*_^ < tt*. Using this fact, we 
shall only start from the dm-block in which ji < U 32 and 

move to the right dm-block until we reach the dm-block [ji, i^ which 

ji U TT* < 22- Then, is equal to mb Thus, in the amortized sense, only 

a linear scan is used for obtaining all of the m^_i A. 

In the following, we shall discuss the data structure needed to implement the 
above algorithm. 

First of all, for each o;- group, we have a is an integer and 0 < o; < n. Thus, 
we may have an array of size n T 1. Whenever a o;-group is created, we shall 
create two pointers inside the o;-th cell. The first and the second pointers always 
point to the leftmost and the rightmost dm-blocks of this group respectively. 

Secondly, within each o;-group, there are several dm-blocks. They will be 
linked by a doubly-linked list so that the locating of the leftmost and the right- 
most dm-blocks can be done in 0(1) time. Since in our algorithm, insertions 
and deletions occur at the ends of the o;-group, each insertion or deletion can 
be done in 0(1) time with this doubly-linked list. Furthermore, as we indicated 
previously, we need to find m^_i ’s. With the array of o;-groups and all double- 
linked lists, the finding can be done by traversing the lists from the dm-block 
[jiM 2 , in which j± < tv*_i < j 2 , where this dm-block was found in the 
(i — l)-th iteration. 

For time-complexity analysis, note the following facts: 

1. In Step 1, the computation of all tt^’s is a suffix-minima computation and it 
can be done in 0(n) time by scanning the permutation array from right to 
left. Similarly, the computation of all max(Vj)’s is a prefix-maxima compu- 
tation and it can be done in 0(n) time by scanning the permutation array 
from left to right. 
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2. In Step 2, there are 2 insertions. 

3. Once Step 3 is executed, the algorithm goes either through Steps 4 and 5 or 
through Step 6. For the path going through Steps 3, 4 and 5 (Steps 3 and 6), 
at most 3 (2) insertions are executed. Thus totally at most 3n-h2 insertions 
are executed. 

4. The total number of deletions is less than the total number of insertions. 
Thus there are at most 3n + 1 deletions. 

5. The time needed to find all is 0(n) time because we always start 

from the previous dm-block where we find m^_ 2 , 7 r*_^ and traverse the linked 
lists to the right. 

From the above discussion, we conclude that the time-complexity of Algorithm 
A is 0(n) in amortized sense. 

5 Conclusions 

In this paper, we presented an algorithm for finding a minimum cardinality 
dominating set for a permutation graph G. This algorithm exploits some subtle 
properties of this problem and runs in 0(n) time in amortized sense. Thus it is 
optimal. Our algorithm is easy to implement because doubly linked lists can be 
used. 
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Abstract. In this paper we discuss how to compute the edit distance 
(or similarity) between two images. We present new similarity measures 
and how to compute them. They can be used to perform more general 
two-dimensional approximate pattern matching. Previous work on two- 
dimensional approximate string matching either work with only substi- 
tutions or a restricted edit distance that allows only some type of errors. 

1 Introduction 

A number of important problems related to string processing lead to algorithms 
for approximate string matching: text searching, pattern recognition, computa- 
tional biology, audio processing, etc. Two-dimensional pattern matching with 
errors has applications on computer vision. 

The edit distance between two strings a and 6, ed(a,6), is defined as the 
minimum number of edit operations that must be carried out to make them equal. 
The allowed operations are insertion, deletion and substitution of characters in 
a or 6. The problem of approximate string matching is defined as follows: given 
a text of length n, and a pattern of length m, both being sequences over an 
alphabet U of size o, find all segments (or “occurrences”) in text whose edit 
distance to pattern is at most where 0 < k < ni. The classical solution is 
0(mn) time and involves dynamic programming [14]. 

Krithivasan and Sitalakshmi (KS) [11] proposed the following extension of 
edit distance for two dimensions. Given two images of the same shape, the edit 
distance is the sum of the edit distance of the corresponding row images. This 
definition is justified when the images are transmitted row by row and there are 
not too many communication errors. However, for many other problems, this 
distance does not reflect well simple cases of approximate matching in different 
settings. For example, we could have a match that only has the middle row of 
the pattern missing. In the definition above, the edit distance would be O(m^) 
if all pattern rows are different. Intuitively, the right answer should be at most 
2m, because only rn characters were deleted in the pattern and rn characters are 
inserted at the bottom. In this paper we extend the edit distance to two dimen- 
sions lifting the problem just mentioned and also extending the edit distance to 
images of different shapes. 

This paper is organized as follows. First we discuss previous work on two- 
dimensional pattern matching with errors and image similarity. Next, we intro- 
duce new notions of similarity between two-dimensional strings or images. As for 
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the one-dimensional counterpart, we consider first the comparison of two images 
and we also discuss how to compute efficiently these new similarity measures. 

Then, we look at the problem of finding approximate matches with at most 
k errors of a rectangular pattern image of size m x m in a larger rectangular 
image (the text) of size n x n. We present a O(m^n^) worst case algorithm, 
and a 0[n‘^k log^ rn /rn^) for k up to 0{rn? / logm) where a denotes the size of 
the (finite) alphabet, for two of the new measures that we define. We end by 
discussing possible extensions and open problems. We denote by a the size of 
the (finite) alphabet. 

2 Previous Work 

Two-dimensional approximate string matching usually considers only substitu- 
tions for rectangular patterns, which is much simpler than the general case with 
insertions and deletions (because in this case, rows and/or columns of the pattern 
can match pieces of the text of different length). For substitutions, the pattern 
shape matches the same shape in the text. 

If we consider matching the pattern with at most k substitutions, one of 
the best results on the worst case is due to Amir and Landau [2] achieving 
0[{k -\- log a) n^) time but using O(n^) space. A similar algorithm is presented in 
Crochemore and Rytter [5]. Ranka and Hey wood [13], on the other hand, solve 
the problem in 0{{k T m)n^) time and 0{kn) space. Amir and Landau also 
present a different algorithm running in 0(n^ log n log log n log m) time. On av- 
erage, the best algorithm is due to Karkkainen and Ukkonen [9], with its analysis 
and space usage improved by Park [12] . The expected time is 0{{n^k log^ m) /m^) 
for 

, rn rn ^ rn^ 

< I ^ 

~ Lrkgcr(^^)lJ 2 dlog^m 

using O(m^) space {0[k) space on average). This time result is optimal for the 
expected case. 

Under the KS definition, Krithivasan [10] presents an 0{m[k-\-log m)n^) algo- 
rithm that uses 0(mn) space. This was improved (for k < m) hj Amir and Lan- 
dau [2] to 0{k‘^n‘^) worst case time using O(n^) space. Amir and Farach [1] also 
considered non-rectangular patterns achieving 0{k{k T ^Jrnlogrn^Jk log/^)n^) 
time. This algorithm is very complicated and non-practical because it uses nu- 
merical convolutions. 

Very recently, Baeza- Yates and Navarro [4] obtained the first fast algorithm 
on average for the KS model. They use a filter algorithm based in multiple ap- 
proximate string matching, achieving 0(n^A:log^m /m^) average-case behavior 
for k < m(m + l)/(51og^m), and using O(m^) space. This time matches the 
best known result for the same problem allowing just substitutions and is opti- 
mal [9], being the upper bound on k only a bit smaller. For higher error levels, 
they present an algorithm with time complexity 0[n‘^k/{wy^)) (where w is the 
size in bits of the computer word), which works for k < m(m + 1)(1 — e/ ^/a), 
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Another related problem is geometric matching, where we have to match a 
geometric figure or a set of points. In this case, the problem is in a continuous 
space rather than a discrete space and usually the Hausdorff measure [3] is used. 

There are other approaches to matching images, which are very different to 
our approach (which belongs to what is called combinatorial pattern matching) . 
Among them we can mention techniques used in pattern matching related to 
artificial intelligence (for example image processing and neural networks [15]) 
and techniques used in databases (extracting features of the image like color 
histograms [6,7]). 

3 Extending the Edit Distance 

Let a and 6 be two images of size nr x nc and rnr x me respectively. In the sequel 
we use rowi[o) to denote the i-th row of a and coli{a) to denote the i-th column 
of a. For example, the KS distance is given by 

nr 

KS{a^b) = ed{rowj (a) , rowj (6) ) 

i=i 

with the restriction that nr = mr. We also use the L-shape idea of Giancarlo 
[8] used for extending suffix trees to two dimensions. We denote by LSij[a) the 
L-shaped string consisting of the first (left) j elements of the i-th row and the 
first (top) i — 1 elements of the j-th column. 

Because our main motivation is approximate matching, we assume that the 
pattern and a text subimage are compared from top to bottom and from left to 
right. That is, a sub-image can be enlarged by extending it the bottom or/and 
the right side. It can be argued that a pattern can match better fixing a different 
corner, but this does not make any difference, because that only changes the text 
position where the match will be reported, and still only one match is found. 
Another convention is that the text occurrence must have the same shape of the 
pattern. Otherwise, we may have occurrences that have at most k errors that 
basically do not count unmatched characters on the boundaries, which is not 
fair. Hence, although our similarity measures work for two images of different 
size, they will be used later for subimages in the text that have the same shape 
as the pattern. 

First, we solve the limitation of the KS model to handle deletions or insertions 
of whole rows. The solution is simple, we just treat each row as a single string 
which is compared to other rows using the normal edit distance (that is, only 
one dimension). If Rij is the distance between rows 1 and i of image a and rows 
1 and j of image fe, we have that 

Rij = mm[Ri-ij T Rij-i + Ri-ij-i T ed{rowi{a)^rowj[b))) 

where the boundary conditions are Ri^o = i • nc and Rqj = j • me, and the 
distance between the two images is given by R[a^b) = Rnr^mr^ 
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In the example given in the introduction, the distance is reduced to less or 
equal than 2m instead of being O(m^) as in the KS model. Similarly, we could 
use columns instead of rows, obtaining C(a,6). This model is much more fair 
than the KS model. Although we use rectangular images, this measure also works 
for any images where rows are connected and continuous. 

Generalizing this idea to insertions and deletions at the same time in rows 
and/or columns is not as simple. Suppose that we have two subimages that we 
want to compare. One alternative is to decompose the border of a subimage in 
rows or columns. Then we can use the following decompositions: 

1 . removing one row or one column from one of the sub images or 

2. removing one row or one column in the same side of each sub image and 
computing the edit distance between them. 

We can apply dynamic programming to find the best possible decomposition. 
That is, if RCi^j^k/ is the distance between the left-top corner of a bounded by 
row i and column j and the left-top corner of b bounded by row k and column 
£, we have that RCi^j^k/ is the minimum of the following values: 

— T j, RCij_i^j.^i T i, RCij^j._i^i T £, and RCij^j.^i_i T A:, which 
corresponds to deleting one row or column in one sub-image; and 

— RCi-ij^k-i/ ed{prefj{roWi{a)),prefi{rowk{b)))) and 
RCij-i^k/-i-\-ed{prefi{colj{a))^prefk{col£{b)))) where prefi[s) denotes the 
first i characters of string s (a prefix) and which corresponds to comparing 
two rows at the bottom or two columns at the right. 

The boundary conditions are RC^ ^ i j = RCi j Q Q = i • j- The distance RC[a^b) 
is given by Rnr,nc,mr,mc- Figure 1 shows all these cases. This distance can also 
be applied to any convex image, for example circles or other regular polygons. 

Nevertheless, this distance does not handle cases where we want to change at 
the same time a row and a column (for example, a motivation could be scaling). 
For that we use the L-shape mentioned earlier. So, we can also decompose the 
border of a subimage using L-shapes and we can have the same extensions as for 
rows or columns. To compare two L-shapes we see them as two one-dimensional 
strings. Then we have the following cases to find the minimal decomposed dis- 
tance: 

— ^ + j ~ 1 R k R t — 1 which corresponds to 

removing an L-shape in a subimage; and 

— (6))) which corresponds to comparing 
two L-shapes. 

The boundary conditions are the same as the RC measure and the final distance 
is similarly given by L{a^b) = Lnr,nc,mr,mc- Figure 1 shows the decompositions 
associated to L. We will see later that this definition can be simplified by using 
the fact that one row and one column are considered at the same time when 
using L-shapes. 
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Fig. 1. Decomposition used in RC (left, 6 cases) and L (right, 3 cases). 



Finally, we can have a general distance All[a^h) that uses both decomposi- 
tions at the same time [RC and L) computing the minimal value of all possible 
cases. It is easy to show that KS[a^b) > R[a^b) > RC[a^b) > All[a^b) and 
that L{a^b) > All[a^b) because each case is a subset of the next. On the other 
hand, there are cases where RC[a^ b) will be less than L[a^ b) and vice versa. In 
fact. Figure 2 this is shown together with other examples, where each color is a 
different symbol. The last example shows that combining RC and L can actually 
lead to a distance less than each separate case. 



mm 

KS = 21 R=14 

RC = 10 L = 20 

a) All = 10 



KS = 4 R = 4 

RC = 3 L = 2 

b) All = 2 



mm 

KS = 9 R = 9 

RC = 9 L = 9 

c) All = 8 



Fig. 2. Three examples for our new measures. 
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4 Computing the Distances 

For sake of simplicity and without loss of generality assume that nr = nc = 
rnr = rnc = m. A direct implementation for R would take O(m^) time and 
0{rn?) space, while for RC and L would require 0(m®) time and O(m^) space. 
The later, in particular, is prohibitive even for small images. However, this can 
be done better. The space is easily reduced to 0{rn) for R and O(m^) to RC 
and L by noticing that we only need to store the boundary of the matrices of 
the dynamic programming computation as they are computed incrementally. 

However, the computation of L can be simplified further by noticing that to 
compute the best decomposition i — j and k — I are always constant (in fact, is 
equal to nr — nc^ which for squares images is 0). This means that only a quadratic 
number of entries must be computed, which implies a matrix boundary of size 
0{rn) and a running time of O(m^). 

We can also improve the computation of RC by noticing that the edit dis- 
tances of prefixes involved can also be computed incrementally. That is, we store 
the distances between each two last prefixes computed and we use them when we 
extend the distance between two prefixes by one or two characters. This needs 
additional space which matches the improved space bound mentioned before. 
Therefore, each edit distance needed in the dynamic programming computation 
can be computed in constant time, reducing the total time to O(m^) for RC ^ 
and hence for All considering that L requires the same time. These optimizations 
allows to handle patterns of reasonable size (say up to 50 x 50). 

Table 1 summarizes the space and time complexity obtained for all the mea- 
sures, including KS. We can see that our measures need only one order of mag- 
nitude more time with respect to KS^ using the same space, except for RC and 
AIL 



Measure 


Time 


Space 


KS 


m'^ 


m 


R, C 


4 

m 


m 


L 


4 

m 


m 


RC 


4 

m 




All 


4 

m 





Table 1. Time and space complexity to compute the distance. 
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5 Two Dimensional Approximate Pattern Matching 

Now we discuss approximate two-dimensional pattern matching. In this section, 
image a is the text, and image b is the pattern. We use k to denote the maximum 
number of errors allowed. For simplicity we use nc = nr = n and rnc = rnr = 
m. The straight forward technique to search an image in a larger image is by 
scanning sequentially the larger image (the text) and computing its distance to 
the pattern. For we will scan the text from top to bottom, using rn columns 
each time, from column k to k R rnc — 1, starting with k = 1 and ending with 
k = n — rn R 1. For each k^ we use the same dynamic programming formulas, 
changing the boundary conditions to allow the pattern to match in any possible 
row i. That is, we use Ri^o = 0 and Rqj = j 'me. We report a match whenever we 
have at most k errors and we have considered all the rows of the pattern. That 
is, when Ri^mr < k. For each scan, the time complexity is O(nm^) and we repeat 
that n times. Therefore, the total worst case time is O(n^m^), using 0(m) extra 
space. This algorithm also works for C by exchanging rows and columns. 
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1 -dimensional multipattern 



l~T~l pattern row i found 



□ possible position of an 
approximate occurrence 

jTTTd text area to verify with 
dynamic programming 



Fig. 3. Filtering algorithm to find potential matches. 



Now, we consider a fast algorithm on average. For the R (or C) measures we 
can use the same fast expected time algorithm of Baeza-Yates and Navarro [4]. 
This algorithm uses a filter that searches all the pattern rows (or columns) with 
a multiple approximate string matching algorithm to find potential matching 
areas (see Figure 3). We only change the verification phase (each potential area 
found by the filter must be verified using the O(n^m^) worst case searching 
algorithm described before) by using R (or C) to compute the distance in a 
fixed area (see Figure 3). Because this algorithm works up to certain threshold 
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level up to where the verification time is negligible, the fact that computing R 
(or C) takes 0(m) the time of computing KS^ will only change the maximal k 
for which the algorithm performs well. Because the maximal k depends on the 
inverse of the logarithm of the verification algorithm plus two, the maximal k 
changes from m(m + 1) /(5 log^ m) to k < m(m + l)/(7 log^ m). 

Therefore, we obtain the same average-case time bound of 0{n^k log^ m /m) 
and almost the same error level bound for which the expected time result is 
valid. That is, k < m(m+ l)/(7 log^ m), using 0(m) extra space. This expected 
time is optimal [9]. How to extend these results to the other measures is still an 
open problem. 

6 Concluding Remarks 

Our measures can be easily extended to more dimensions. For d-dimensions we 
use [d — 1) -dimensional strings for the decompositions. The only drawback is 
that the number of cases grows exponentially with the number of dimensions. 
Then, computing All(a^b) for d- dimensional strings would require 0(2^n^^) time 
and O(n^) space. 

An open problem is to design optimal worst-case time algorithms for approx- 
imate searching using the new measures. That is, achieving O(n^) time com- 
plexity for the measure. Also, finding fast filtering algorithms for the other 
measures is matter of future research. 

Neither of the new measures defined can handle scaling transformations nor 
rotations. A more realistic distance can be defined using the following idea, which 
tries to define the largest common image of two images, which generalizes the 
concept of longest common subsequence of one-dimensional strings. Given two 
images, find a set of position pairs that match exactly in both images subject to 
the following restrictions: 

1. The set of positions for the same pattern are disjoint; 

2. a suitable order given by the position values is the same for both images (for 
example, image pixels can be sorted by their i R j value, using the value of 
i in the case of ties); and 

3. the total size of the set of positions is maximized. 

For the edit distance, condition 3 has to be changed to: 

3. Minimize the number of mismatches, insertions and deletions needed to ob- 
tain the set of matching positions. 

This model may match a rotated pattern, because no corner is fixed. Figure 4 
gives an example. All pieces of the pattern not in the text corresponds to dele- 
tions and mismatches and should be counted. In the text, black regions are not 
counted, because correspond to mismatches. All other pieces are insertions in 
the pattern. It is not clear that the minimal string editing solution gives the 
same answer as the largest common set of sub-images. Also, it could be argued 
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Fig. 4. Example of largest common image. 



that characters inserted/deleted on external borders should not be counted as 
errors. 

The approximate two-dimensional pattern matching problem can be stated 
as usual using the above definition as searching for all rectangular subimages 
of the text that have edit distance at most k with the pattern. An alternative 
definition would be to find all pieces of the text that have at least ra^ — k matching 
positions with the pattern. 
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Abstract. Indexing schemes for grids based on space-filling curves (e.g., 
Hilbert indexings) find applications in numerous fields. Hilbert curves 
yield the most simple and popular scheme. We extend the concept of 
curves with Hilbert property to arbitrary dimensions and present first 
results concerning their structural analysis that also simplify their appli- 
cability. As we show, Hilbert indexings can be completely described and 
analyzed by “generating elements of order 1”, thus, in comparison with 
previous work, reducing their structural complexity decisively. 



1 Introduction 

Discrete multi-dimensional spaces are of increasing importance. They appear in 
various settings such as combinatorial optimization, parallel processing, image 
processing, geographic information systems, data base systems, and data struc- 
tures. In many applications it is necessary to number the points of a discrete 
multi-dimensional space (or, equivalently, a grid) by an indexing scheme map- 
ping each point bijectively to a natural number in the range between 1 and 
the total number of points in the space. Often it is desirable that this indexing 
scheme preserves some kind of locality, that is, close-by points in the space are 
mapped to close-by numbers or vice versa. For this purpose, indexing schemes 
based on space-filling curves have shown to be of high value [4, 5, 6,7,8, 9]. 

In this paper we study Hilbert indexings, perhaps the most popular space- 
filling indexing schemes. Properties of 2D and 3D Hilbert indexings have been 
extensively studied recently [4,5,6,7,8,9,10]. However, most of the work so far has 
focused on empirical studies. Up to now, little attention has been paid to the 
theoretical study of structural properties of multi-dimensional Hilbert curves, 
the focus of this paper. Whereas with “modulo symmetry” there is only one 
2D Hilbert curve, there are many possibilities to define Hilbert curves in the 3D 
setting [4,9]. The advantage of Hilbert curves is their (compared to other curves) 
simple structure. 

Our results can shortly be sketched as follows. We generalize the notion of 
Hilbert indexings to arbitrary dimensions. We clarify the concept of Hilbert 
curves in multi-dimensional spaces by providing a natural and simple mathe- 
matical formalism that allows combinatorial studies of multi-dimensional Hilbert 
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indexings. For reasons of (geometrical) clearness, we base our formalism on per- 
mutations instead of e.g. matrices or other formalisms [2,3,4,10]. So we obtain the 
following insight: Space-filling curves with Hilbert property can be completely 
described by simple generating elements and permutations operating on them. 
Structural questions for Hilbert curves in arbitrary dimensions can be decided 
by reducing them to basic generating elements. Putting it in catchy terms, one 
might say that for Hilbert indexings what holds “in the large” (i.e., for large 
side- length), can already be detected “in the small” (i.e., for side-length 2). In 
particular, this provides a basis for mechanized proofs of locality of curves with 
Hilbert property (cf. [9]). In addition, this observation allows the identification of 
seemingly different 3D Hilbert indexings [4], the generalization of a locality result 
of Gotsman and Lindenbaum [6] to a larger class of multi-dimensional indexing 
schemes, and the determination that there are exactly 6*2® = 1536 structurally 
different 3D Hilbert curves. The latter clearly generalizes and answers Sagan’s 
quest for describing 3D Hilbert curves [10]. Finally, we provide an easy recursive 
formula for computing Hilbert indexings in arbitrary dimensions and sketch a 
recipe for how to construct an r-dimensional Hilbert curve for arbitrary r in an 
easy way from two (r — l)-dimensional ones. Some missing details and proofs 
can be found in the full version of the paper [1]. 



2 Preliminaries 

We focus our attention on cubic grids, where, grid of side- length n. An r- 
dimensional (discrete) curve C is simply a bijective mapping C \ {1, . . .n”} — 
{1, . . . , n}”. Note that, by definition, we do not claim the continuity of a curve. 
A curve C is called continuous if it forms a Hamilton path through the n” grid 
points. An r-dimensional cubic grid is said to be of order k if it has side-length 2^. 
Analogously a curve C has order k if its range is a cubic grid of order k. 
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Fig. 1. The generator Hil^ and its 
canonical corner- indexing Hil^ . 
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Fig. 2. Construction scheme for the 
2D Hilbert indexing. 



Fig. 1 shows the smallest 2D continuous curve indexing a grid of size 4. This 
curve can be found in Hilbert’s original work (see [11]) as a constructing unit 
for a whole family of curves. Fig. 2 shows the general construction principle for 
these so-called Hilbert curves: For any k > 1 four Hilbert indexings of size 4^ 
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are combined into an indexing of size 4^+^ by rotating and reflecting them in 
such a way that concatenating the indexings yields a Hamilton path through 
the grid. One of the main features of the Hilbert curve is its “self-similarity”. 
Here “self-similar” shall simply mean that the curve can be generated by putting 
together identical (basic construction) units, only applying rotation and reflec- 
tion to these units. In a sense, the Hilbert curve is the “simplest” self-similar, 
recursive, locality-preserving indexing scheme for square meshes of size 2^ x 2^. 

3 Formalizing Hilbert curves in r dimensions 

In this section, we generalize the construction principle of 2D Hilbert curves to 
arbitrary dimensions in a rigorous, mathematically precise way. 

3.1 Classes of Self-Similar Curves and their generators 

Let Vj. := {xiX 2 • • • | x^ G {0,1} } be the set of all 2” corners of an 

r-dimensional cube coded in binary. Moreover, let 2 : Vr — ^ {D • • • ? denote 
an arbitrary indexing of these corners. To describe the orientation of subcurves 
inside a curve of higher order, we want to use symmetry mappings, which can be 
expressed via suitable permutations operating on such corner- indexings. Observe 
that any r-dimensional curve C± of order 1 naturally induces an indexing of these 
corners (see Fig. 1 and Fig. 3). We call the obtained corner-indexing the canonical 
one and denote it by Ci : Vr — (1, • • • ,2”}. Furthermore, let Wx denote the 
group of all permutations (operating on 2) that describe rotations and reflections 
of the r-dimensional cube. In other words, Wj is the set of all permutations that 
preserve the neighborhood-relations n(i, j) of the corner indexing 2: 

Wt := {tt e Sym{2^') : n{i, j) = n{Tr{i),Tr{j)) Vi, j G {1, . . . , 2’’}}. 

For a given permutation r G Hj, we sometimes write (r : 2) in order to em- 
phasize that r is operating on a cube with corner- indexing 2. The point here is 
that once we have fixed a certain corner- indexing 2, the set Wj will provide all 
necessary transformations to describe a construction principle of how to generate 
curves of higher order by piecing together a suitable curve of lower order. Obvi- 
ously each permutation {r : 2) acting on a given corner- indexing 2 canonically 
induces a bijective mapping on a cubic grid of order k. Subsequently, we do not 
distinguish between a permutation and the corresponding mapping on a grid. 

We partition an r-dimensional cubic grid of order k into 2” subcubes of 
order k — 1. For each xi • —Xr E Vr we therefore set 

•= ("^1 • • 2'^-!) G {0, ... ,2^^ - 1} X ... X {0, ... ,2^^ - 1} 

to be the “lower- left corner” of such a subcube. Let Cj^-i be an r-dimensional 
curve of order k — 1 {k > 2). Our goal is to define a “self-similar” curve Cj^ of 
order k by putting together 2” pieces of type Ck-i - Let 2 : Vr — (1, . . . , 2”} be 
a corner- indexing. We intend to arrange the 2” subcurves of type Ck-i “along” 
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2. The position of the i^-th (where G 2^^}) subcurve inside Ck can for- 

mally be described with the help of the grid-points y Bearing in mind the 

classical construction principle for the 2D Hilbert indexing, the orientation of the 
constructing curve Ck-i inside Ck can be expressed by using symmetric trans- 
formations (that is reflections and rotations). For any sequence of permutations 
Ti, . . . , T 2 ^ G Wx we therefore define 

C'ft(i) := {Tif :I) oCk-i {i mod (2*^^)’’) + (1) 

where i G (2^)^^} and = (i — 1) div (2^“^)^^ + 1. The geometric in- 

tuition behind is that the curve Ck can be partitioned into 2'^' components of 
the form Ck-i (reflected or rotated in a suitable way). These subcurves are ar- 
ranged inside Ck “along” the given corner- indexing 2. The orientation of the 
F-th subcurve inside Ck is described by the effect of operating on 2. 

Definition 1. Whenever two r-dimensional curves Ck-i of order k — 1 and Ck 
of order k satisfy equation (1) for a given sequence of permutations ri , . . . , T 2 ^ G 
Wj (operating on the corner-indexing 2 : W — ^ we will write 

Ck-i t 2 ?)^ Ck -1 the constructor of C/^. 

Our final goal is to iterate this process starting with a curve C± of order 1. 
It’s only natural and in our opinion “preserves the spirit of Hilbert” to fix the 
corner- indexing according to the structure of the defining curve Ci. Hence, in 
this situation we can specify our 2 to be the canonical corner- indexing C± . By 
successively repeating the construction principle in equation (1) k times, we 
obtain a curve of order k. 



Definition 2. Let C = {Ck \ A:>l}bea family of r-dimensional curves 
of order k. We call C a Class of Self-Similar Curves (CSSC) if there exists a 
sequence of permutations ri, . . . ,T 2 ^ G IF^ (operating on the canonical corner- 
indexing Cl) such that for each curve Ck it holds that 



Cl 



Cl 



< C2 



Cl 

(ti,...,T27^) 






Cl 

(ti,...,T2^) 






Cl 

(ti,...,T27^) 



<c.. 



In this case, Ci is called the generator of the CSSC C and we define the set 

Cl, (ti, . . . ,T 2 ^) ) '•= {Ck I A: > 1 } to be the CSSC generated by Ci and 
Ti, . . . ,T 2 ^. a CSSC C = {Ck I A:>l}is called Class with Hilbert Property 
(CHP) if all curves Ck are continuous. 

Note that the CSSC 7d( Ci, (vi , . . . , T 2 ^) ) is well-defined, because any CSSC 
is uniquely determined by its generator Ci and the choice of the permutations 
Ti, . . . ,T 2 r- G Wyy . Our concept for multi-dimensional CHPs only makes use of 
the very essential tools which can be found in Hilbert’s context (cf. [11]) as 
rotation and reflection. We deliberately avoid more complicated structures (e.g., 
the use of different sequences of permutations in each inductive step, or the use of 
several generators for the constructing principle) in order to maintain conceptual 
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simplicity and ease of construction and analysis. However, the theory which we 
develop in this paper doesn’t necessarily restrict to the continuous case. We end 
this subsection with an example. One easily checks that the classical 2D Hilbert 
indexing can be described via 7d( Hil^, ((2 4), id, id, (1 3)) ) = { Hil^ | A: > 1 }, 
where the generator Hil^ is given in Fig. 1. As Theorem 2 will show, this is the 
only CHP of dimension 2 “modulo symmetry.” 



3.2 Disturbing the generator of a CSSC 

In this subsection we analyze the effects of disturbing the generator of a CSSC 
by a symmetric mapping. We will see that any disturbance of the generator 
will be hereditary to the whole CSSC in a very canonical way. And also the 
other way round: if two different CSSCs show a certain similarity in one of their 
members, this similarity can already be found in the structure of the correspond- 
ing generators. We illustrate this by the following diagram. Given two CSSCs 



Hiy Cl, (ri, . . . ,T2r) ) = {Ck \ k>l} and U{ Di, (n, . . . ,T 2 ^) ) = {Dk \ k>l}, 
respectively^ Suppose there is a similarity at a certain stage of the construction, 
i.e., for some ko the curves CkQ and DkQ can be obtained from each other by a 
similarity transformation The investigations in this section will show that the 
inner structure of CSSCs are strong enough to yield the same behavior at the 
stage of any order. _ _ _ 



Cl 



Cl 

o 



<C2 



Cl 

(ti,...,T27^) 

o 



< ••• 






Cl 

(Ti,...,T2r) 

o 



< Ck 






Cl 

(Ti,...,T2r) 



<^2 



Cl 

(Ti,...,T2r) 



< 






Consequently, for issues like structural behavior, it will be sufficient to analyze 
the generating elements of a CSSC only, since we find all necessary information 

encoded here. We start with a simple observation concerning the behavior of the 
construction principle of Definition 1 under the “symmetric disturbance” of a 
constructor. We omit the proof. 



Lemma 1. LetCk-i andCk be curves of order k — I andk^ respectively. Suppose 
Ck-i is the constructor of Ck; i.e.^ Ck-i t 2 C^ sequence of 

permutations ti, . . . ,T 2 r- G Wj (acting on a given corner-indexing I). Then for 
arbitrary f G Wj we have 

(y(j) .2^) o Ck—i (n 00“ 1 . ,,t2T’ 00“ 1)^^ Ck- 

Whereas, by Lemma 1, we investigated the influence of disturbing the con- 
structor, we now, in a second step, analyze how transforming the underlying 
corner- indexing influences the construction principle. We will need such a result, 
since two different CSSCs (by definition) come up with two different corner- 
indexings, each of which given by the underlying generator. Again we omit the 
proof. 

^ Note that the r’s used in the definition of both CSSCs yield completely different 
automorphisms on the grid. Whereas in the first case they refer to the corner-indexing 
Cl, in the second case they act on the corner- indexing Di, given by generator Di. 
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Lemma 2. Given the assumptions of Lemma 1 (thai is: Ck-i 

for two curves Ck-i and Ck of successive order) ^ then for arbitrary f G IRx ci'^d 

the modified corner-indexing 1C := oX with <P = : X) = [f : JC) we have^ 

C'*-l (nO0,...,r2r^</.)< ^oCk. 

Lemma 1 and 2 now allow the proof of the main result of this section. For 
its illustration we refer to the diagram at the beginning of this section. Do also 
recall the point made in the footnote there. 

Theorem 1. Let Ci be the generator of the CSSCl-L{Ci^ (ri, . . . , T 2 ^) ) = {Ck \ 
k > 1} and Di the generator of the CSSC 7i{ (ri, . . . ,T 2 ^) ) = {Dk \ k > 
1}. For an arbitrary permutation f G and the corresponding symmetric 

mapping F = : C\) = : Di)^ the following statements are equivalent: 

(i) FoCko = Dko for some ko>l. 

(ii) F o Ck = Dk for all k > 1, 



Proof, (ii) ^ (i) is trivial. For (i) ^ (ii) we first show that statement (ii) is 
true for the generators C± and D±: If ko > 1 we can divide the cubic grid of 
order ko into 2'^ subgrids of order ko — 1. By the construction principle for CSSCs, 
the curves CkQ and Dk^ traverse these subgrids “along” the canonical corner- 
indexings C\ resp. D\, Since, by assumption, Fo Ck^ = Dk^^ the corresponding 
relation also holds true for the corner- indexings C\ and Di, which finally yiel^ 
the validity of the equation F oC\ = Di, because of the isomorphisms C\ C\ 
resp. D\ D\. We proceed proving (ii) by induction on k. Assuming that 
Dk = F oCk we show this relation for k -\- 1 hj applying Lemma 1 and Lemma 
2. Since {Ck | A: > 1} is a CSSC, we get 






= Dk 



LGmm3j 2 

^ < FoCk+i 



where the last relation makes use of Di = o Ci, which we immediately 
obtain from the given equation D\ = F o CiP This implies Dk-\~i = F o Ck-\-i 
because of the CSSC-property of {Dk \ k > 1}, □ 



In particular, the result of Theorem 1 implies that any questions concerning 
the structural similarity of two CSSCs can be reduced to the analysis of their 
generators. 

^ The fact that the corner-indexing is disturbed by instead of f is due to technical 
reasons only. 

^ A disturbance by F implies a transformation of the corner- indexings by ^ which 
can be easily checked. 
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generator Hilf.A generator Hilf.B generator Hilf.C 



Fig. 3. Continuous 3D generators Hil^.x and their canonical corner- indexings 
Hil^.x. 

4 Applications: Computing and analyzing CHPs 

First in this section, we attack a classification of all structurally different CHPs 
for higher dimensions. Whereas we can provide concrete combinatorial results 
for the 2D and 3D cases, the high-dimensional cases appear to be much more 
difficult. The basic tool for such an analysis, however, is given by Theorem 1. 
The following theorem justifies the naming “class with Hilbert property” (CHP). 

Theorem 2. The classical 2D Hilbert indexing Ti{ Hi^^ ((^4), id^ id^ (1 3)) ) is 
the only CHP of dimension 2 modulo symmetry. 

Proof, Due to Theorem 1 it suffices to show that Hil^ is the only continuous 
2D generator, which is obvious. In addition, we have to check whether there is 
another sequence of permutations such that 4 generators Hil^ can be arranged in 

a grid of order 2 along the canonical corner-indexing Hil^ in a continuous way. A 
simple combinatorial consideration shows that no other sequence of permutations 
yields a continuous curve of order 2 whose starting- and endpoints are located 
at corners of the grid. However, any constructor for a continuous curve of higher 
order must have this property. □ 

What about the 3D case? The analysis of the “Simple Indexing Schemes” (which 
are related to our CHPs) in Chochia and Cole [4] already shows that the number 
of CHPs in the 3D case grows drastically compared to the 2D setting. However, 
by our analysis, lots of “Simple Indexing Schemes” in [4] now turn out to be 
identical modulo symmetry. We state the following classification-theorem, which 
treats the 3D case entirely. It also generalizes and answers work of Sagan [10]. 

Theorem 3. For the 3D case there are 6*2® = 1536 structurally different (that 
is: not identical modulo reflection and rotation) CHPs, These types are listed in 
Table 1, 

Proof (Sketch), Theorem 1 says that we can restrict our attention to checking 
any continuous curves of order 1 which are different modulo symmetry. Given 
such a continuous generator C, the total amount of CHPs which can be con- 
structed by C is given by all possibilities of piecing together 8 (rotated or re- 
flected) versions of C (“subcurves”) along its canonical corner- indexing C, By 
exhaustive search, we get that there are 3 different (modulo symmetry) types 
of continuous generators, namely Hil^.A, Hilf.B and Hil^.C (see Fig. 3). As 
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Table 1. Description of all 3-dimensional CHPS. 



generator 


version 


Tl 


T2 


T3 


T4 


Hilf.A 


(a) 


(2 8)(3 5) / 
(2 4 8)(3 5 7) 


(3 7)(4 8) / 
(2 8 4)(3 7 5) 


(3 7)(4 8) / 
(2 8 4)(3 7 5) 


(1 3)(6 8) / 

(1 3)(2 4)(5 7)(6 8) 


(b) 


(2 8)(3 5) / 
(2 4 8)(3 5 7) 


(3 7)(4 8) / 
(2 8 4)(3 7 5) 


id / 

(2 4) (5 7) 


(1 7 3)(4 6 8)/ 
(1 7 5 3)(2 8 6 4) 


(c) 


(2 8)(3 5) / 
(2 4 8)(3 5 7) 


(3 7) (4 8) / 
(2 8 4)(3 7 5) 


137 

(2 4) (5 7) 


(1 7)(4 6) / 
(1 7 5)(2 6 4) 


(d) 


(2 8)(3 5) / 
(2 4 8)(3 5 7) 


(3 7) (4 8) / 
(2 8 4)(3 7 5) 


(3 7) (4 8) / 
(2 8 4)(3 7 5) 


(1 3)(6 8) / 

(1 3)(2 4)(5 7)(6 8) 


mil.B 


(a) 


(2 8)(5 7) / 
(2 6 8)(3 5 7) 


id / 

(2 6) (3 7) 


(3 5)(6 8) / 
(2 8 6)(3 7 5) 


(2 8)(5 7) / 
(2 6 8)(3 5 7) 


(b) 


(2 8)(5 7) / 
(2 6 8)(3 5 7) 


id / 

(2 6) (3 7) 


(3 5)(6 8) / 
(2 8 6)(3 7 5) 


(3 5)(6 8) / 
(2 8 6)(3 7 5) 


generator 


version 


T5 


T6 


r? 


T8 


Hilf.A 


(a) 


(1 3)(6 8) / 

(1 3)(2 4)(5 7)(6 8) 


(1 5)(2 6) / 
(1 5 7)(2 4 6) 


(1 5)(2 6) / 
(1 5 7)(2 4 6) 


(1 7)(4 6) / 
(1 7 5)(2 6 4) 


(b) 


(1 3 5)(2 6 8)/ 
(1 3 5 7)(2 4 6 8) 


id / 

(2 4) (5 7) 


(1 5)(2 6) / 
(1 5 7)(2 4 6) 


(1 7)(4 6) / 
(1 7 5)(2 6 4) 


(c) 


(2 8)(3 5) / 
(2 4 8)(3 5 7) 


137 

(2 4) (5 7) 


(1 5)(2 6) / 
(1 5 7)(2 4 6) 


(1 7)(4 6) / 
(1 7 5)(2 6 4) 


(d) 


(1 3 5)(2 6 8)/ 
(1 3 5 7)(2 4 6 8) 


id / 

(2 4) (5 7) 


(1 5)(2 6) / 
(1 5 7)(2 4 6) 


(1 7)(4 6) / 
(1 7 5)(2 6 4) 


HilFB 


(a) 


(1 3)(4 6) / 
(1 3 7)(2 4 6) 


(1 3)(4 6) / 
(1 3 7)(2 4 6) 


id / 

(2 6) (3 7) 


(1 7)(2 4) / 
(1 7 3)(2 6 4) 


(b) 


(1 7)(2 4) / 
(1 7 3)(2 6 4) 


(1 3)(4 6) / 
(1 3 7)(2 4 6) 


id / 

(2 6) (3 7) 


(1 7)(2 4) / 
(1 7 3)(2 6 4) 



described above, we now have to check whether there are continuous arrange- 
ments of these generators along their canonical corner- indexings. Beginning with 
type A, an exhaustive combinatorial search yields that there are 4 possible con- 
tinuous formations of Hil^.A along Hil^.A. All possibilities are shown in Fig. 4, 
where the orientation of each subcube is given by the position of an edge (drawn 
in bold lines) . For each subcube there are two symmetry mappings which yield 
possible arrangements for the generator within such a subgrid. The permuta- 
tions expressing these mappings are listed in Table 1. Analogously, we find out 
the possible arrangements for generator type B. Note that there are no more 
than 2 different continuous arrangements of this generator along its canonical 
corner- indexing. Finally we easily check that Hil^.C cannot even be the construc- 
tor of a continuous curve of order 2. Table 1 thus yields that there are exactly 
4 • 2^ T 2 • 2^ = 6 • 2^ structurally different CHPs. □ 

Construction of an r-dimensional Hilbert curve. Without giving an 
explicit proof here, we just indicate how the construction of a high-dimensional 
CHP can be done inductively: A continuous generator of dimension r can be de- 
rived simply by “joining together” two continuous generators of dimension r — 1. 
As an example we give a CHP of dimension 4, whose generator Hilf is con- 
structed by joining together two generators Hil^, version (a) (cf. Figure 3). The 
generator Hil^ and a suitable sequence of permutations are shown in Fig. 5. Note 
that this construction principle can be extended to obtain Hilbert indexings in 
arbitrary dimensions in an expressive and easy, constructive way: Following the 
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Fig. 4. Construction principles for CHPs with generators Hili.A and Hil^.B. 



construction principle of Hil^, version (a), first pass through an r — 1-dimensional 
structure, then in “two steps” do a change of dimension in the rth dimension, and 
finally again pass through an r — 1-dimensional structure. This method applies 
to finding the generators as well as to finding the permutations. 

Recursive computation of CSSCs. Note that whenever a CSSC C — {Ck \ 
A: > 1} is explicitly given by its generator and the sequence of permutations, we 
may use the recursive formula (1) of Subsection 3.1 to compute the curves C/^. 
In other words, the defining formula (1) itself provides a computation-scheme 
for CSSC, which is parameterized by the generating elements (generator and 
sequence of permutations). 

Aspects of locality. The above mentioned parameterized formula might, for 
example, also be used to investigate locality properties of CSSCs by mechanical 
methods. The locality properties of Hilbert curves have already been studied in 
great detail. As an example for such investigations, we briefly note a result of 
Gotsman and Lindenbaum [6] for multi-dimensional Hilbert curves. In [6] they 
investigate a curve C : {1, . . . , n”} — {1, . . . , n}” with the help of their locality 
measure L 2 {C) = niax^ C(j)))/|i — j|, where cl 2 denotes the 

Euclidean metric. In their Theorem 3 they claim the upper bound < 

(r + 3 ) 22 ” for any r-dimensional Hilbert curve of order A:, without precisely 
specifying what an r-dimensional Hilbert curve shall be. Since the proof of their 
result does not utilize the special Hilbert structure of the curve, this result can 
even be extended to arbitrary CSSCs. 
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^1 = (2 16)(3 13)(6 12)(7 9 ) 

^2 = (3 15)(4 16)(5 9)(6 10 ) 

t-3 = 7-2 

^4 =(13 13 11 9 7)(2 4 14 12 10 8)(5 15)(6 16 ) 

^5 = ^4 

^6 =(1 5 13 9)(2 6 14 10)(3 11 15 7)(4 12 16 8 ) 

^7 = ^6 

^8 = (1 7)(4 6)(10 16)(11 13 ) 

^9 = ^8 

= (1 9 13 5)(2 10 14 6)(3 7 15 11)(4 8 16 12 ) 
"^ll = "^lO 

^12 = (1 11)(2 12)(3 5 7 9 15 13)(4 6 8 10 16 14 ) 

= "^12 

= (1 13)(2 14)(7 11)(8 12 ) 

^15 = ^14 

^16 = (1 15)(4 14)(5 11)(8 10 ) 



Fig. 5. Constructing elements for a 4-D CHP 



(generator and permutations). 



5 Conclusion 

Our paper lays the basis for several further research directions. So it could be 
tempting to determine the number of structurally different r-dimensional curves 
with Hilbert property for r > 3. Moreover, a (mechanized) analysis of locality 
properties of r-dimensional (r > 3) Hilbert curves is still to be done (cf. [9]). An 
analysis of the construction of more complicated curves using more generators 
or different permutations for different levels remains open. 
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Abstract. Any attempt to find connections between mathematical 
properties and complexity has a strong relevance to the field of Com- 
plexity Theory. This is due to the lack of mathematical techniques to 
prove lower bounds for general models of computation. 

This work represents a step in this direction: we define a combinatorial 
property that makes Boolean functions “hard” to compute and show how 
the harmonic analysis on the hypercube can be applied to derive new 
lower bounds on the size complexity of previously unclassified Boolean 
functions. 



1 Introduction 



Any attempt to find connections between mathematical properties of functions 
and their computational complexity has a strong relevance to theory of compu- 
tation. Indeed, there is the hope that developing new mathematical techniques 
could lead to discovering properties that might be responsible for lower bounds. 
The subject of this paper is related to the above general arguments, and in 
particular to showing how the Abstract Harmonic Analysis on the hypercube 
can provide some insight in our current understanding of Boolean function com- 
plexity. Our main result consists of new lower bounds on the size complexity of 
explicit functions, exactly derived by applying the above techniques. 



One of the best-known results in Circuit Complexity is that constant depth 
circuits require exponential size to compute the parity function (see [2] and [3] ) . 
Here we generalize this result to a hierarchy of previously unclassified classes of 
functions. 

This hierarchy is defined as follows. Let 0 < t < n, and let be the class 
of functions, depending on n variables, that take the value 1 on exactly 2^“^ 
input strings. We then divide into levels^ where the k-th level, which we 
denote by ^ \q defined as the subset of the functions / G Bn^ such that any 

subfunction of /, depending on k [k > t) variables, takes the value 1 on 2^“^ 
input strings. These definitions are made precise below. 

Our main result is that A -circuits cannot compute functions in the k-th 
level of Bn \ whenever k = n — (logn)“^^^^ and t = (logn)^^^^. More precisely, we 



prove that a circuit of constant depth d require size i7 2 20 



any function in for any t and any k. 






— t 



to compute 



Wen-Lian Hsu, Ming- Yang Kao (Eds.): COCOON’98, LNCS 1449, pp. 339-348, 1998. 
(c) Springer- Verlag Berlin Heidelberg 1998 



340 Anna Bernasconi 



We also prove that nontrivial examples of functions exist for each level of 
this hierarchy if k > and conjecture that this bound is not far from being 

asymptotically optimal. 

The main tool of the lower bound proof is the harmonic analysis on the 
hypercube, that yields an interesting spectral characterization of the functions 
in the above hierarchy, together with a result proved in [6], stating that AC^ 
functions have almost all of their power spectrum on the low-order coefficients. 

Finally, notice that this paper generalizes results in [1], where it has been 
proven that -circuits cannot compute strongly balanced functions. Indeed, 
the class of strongly balanced functions coincides with the [n — (log -th 

level of the class . 

The rest of the paper is organized as follows. In Section 2 we provide some 
of the notation we use, and recall some basic definitions. In Section 3 we give 
the necessary background on Fourier transform on the hypercube, and review 
the results by Linial et al. (see [6]) about the spectral characterization of AC^ 
functions. Section 4 is devoted to the definition of the classes and of their 
levels In Section 5 we derive a spectral characterization of the functions 

in any level of B ^^ , and in Section 6 we prove our main result stating that AC^- 
circuits cannot compute functions in the level ^ whenever k = n— (log 
and t = (logn)^^^^. In Section 7 we provide nontrivial examples of functions in 
any level g^ch that k > n. Finally, in Section 8 we provide a framework 

for future research. 

2 Basic Definitions 

First of all, we provide some of the notation we use. 

Given a Boolean function / on n binary variables, we will use different kinds 
of notation: the classical notation^ where the input string is given by n binary 
variables; the set notation^ based on the correspondence between the set {0, 1}^ 
and and the power set of {1,2, .. . ,n}; the 2^-tuple vector representation f = 
{fofi where fi = f[x{i)) and x[i) is the binary expansion of i. 

Unless otherwise specified, the indexing of vectors and matrices starts from 0 
rather than 1. 

We will use the notation |/| to denote the cardinality of /, that is the number 
of strings accepted /: 



I/I = i{^ ^ {^.ir I fH = 1}I • 

Given a binary string w G {0,1}^, we denote with the string obtained 
from w by flipping its i-th bit (1 < i < n), i.e. w and differ only on the i-th 

bit, and by |tc| the number of ones in tc, which is sometimes called cardinality 
of the string because of the correspondence between sets of positive integers and 
strings over the alphabet {0,1}. If w and v are two binary strings of the same 
length, then w (Bv denotes the string obtained by computing the exclusive or of 
the bits of w and v. Finally, all the logarithms are to the base 2. 
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We now review some basic definitions. 

AC^ circuits 

An AC^ circuit consists of AND, OR and NOT gates, with inputs xi, . . . , 
Fan-in to the gates is unbounded. The size of the circuit (i.e. the number of 
the gates) is bounded by a polynomial in n, and its depth is bounded by a 
constant. Without loss of generality we can assume that negations occur only 
as negated input variables. If negations appear higher up in the circuit we 
can move them down to the inputs using De Morgan’s laws which at most 
doubles the size of the circuit. Finally, observe that we have alternating 
levels of AND and OR gates, since two adjacent gates of the same type can 
be collapsed into one gate (for a more detailed description, see [3]). 

Restriction 

A restriction is a mapping of the input variables to the set {0, where 

- p{xi) = 0 means that we substitute the value 0 for xp 

- p{xi) = 1 means that we substitute the value 1 for xp 

- p{xi) = means that Xi remains a variable. 

Given a function / on n binary variables, we will denote by fp the function 
obtained from / by applying the restriction p; fp will be a function of the 
variables Xi for which p{xi) = ic. 

The domain of a restriction /?, dom{p)^ is the set of variables mapped to 0 
or 1 by p. The size of a restriction p, size{p)^ is defined as the number of 
variables which were given the value i.e. size[p) = n — |dom(p)|. 

3 Abstract Harmonic Analysis and AC^ Functions 

We give some background on abstract harmonic analysis on the hypercube. We 
refer to [5] for a more detailed exposition. 

We consider Boolean functions as 0-1 valued real functions defined on the 
domain {0, 1}^. They are a vector space of dimension 2^, and the set of functions 
{fx{y) = (1 if and only if x = y)}, where x ranges over {0, 1}^ is a basis. Another 
basis is given by the functions {gs{^) = where the sum is modulo 2. 

The Fourier coefficients of / are the coefficients of / in this basis. 

More precisely, consider the space iF of all the two-valued functions on 
{0, 1}^. The domain of .A is a locally compact Abelian group and the elements 
of its range, i.e. 0 and 1, can be added and multiplied as complex numbers. The 
above properties allow one to analyze F by using tools from harmonic analysis. 
This means that it is possible to construct an orthogonal basis set of Fourier 
transform kernel functions for F. The kernel functions of the Fourier transform 
are defined in terms of a group homomorphism from {0, 1}^ to the direct product 
of n copies of the multiplicative subgroup {±1} on the unit circle of the complex 
plane. The functions Qto{x) = (— l)^i®i ( — 1 )^ 2®2 _ _ ^ 

known as Fourier transform kernel functions, and the set {Qw\u) G {0,1}^} is 
an orthogonal basis for F. 
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We can now define the Abstract Fourier Transform of a Boolean function / 
as the rational valued function f* which defines the coordinates of / with respect 
to the basis ^ 

/*(w) = . 

X 



Then 

/(^) = 

w 

is the Fourier expansion of /, 

It is interesting to note that the zero-order Fourier coefficient is equal to the 
probability that the function takes the value 1 , while the other Fourier coefficients 
measure the correlation between the function and the parity of subsets of its 
input bits (see [6] for more details). 

Using the binary 2^-tuple representation for the functions / and /*, and con- 
sidering the natural ordering of the n-tuples x and tc, one can derive a convenient 
matrix formulation for the transform pair. Let us consider a 2^ x 2^ matrix 

whose (i,j)-th entry hij satisfies hij = (— l)®b) where x:{i)^ x:[j) denotes 

T 

the inner product of the binary expansions of i and j . If / = [/o /i • • • / 2 ^-i] 
and f* = [/o /i . . then, from the fact that = 2~^ we get 

f=H^r and/* = 2— 

Note that the matrix Hn is the Hadamard symmetric transform matrix and 
can be recursively defined as 



Hi = 



1 1 

1 -1 



H 



n 



f Hji—i Mji—i 
\Hji—i Hfi—i 



We now present an interesting application of harmonic analysis to circuit 
complexity, due to Linial et al. (see [6]). 

As we have already mentioned, one of the best known results in circuit com- 
plexity is that AC^ circuits require exponential size to compute the parity func- 
tion. More precisely, -circuits cannot even approximate the parity function. 
This fact has a direct consequence on the Fourier transform, because, as we have 
already mentioned, the Fourier coefficients measure the correlation between the 
function and the parity of subsets of its input bits. Consequently, each high order 
Fourier coefficient of an AC^ function must be very small (where “high order” 
means coefficients corresponding to strings of large cardinality). By exploiting 
this fact, Linial et al. were able to prove that not only is each individual high 
order coefficient small, but in fact the sum of squares (i.e. the power spectrum) 
associated with all high Fourier coefficients is very small. 

Lemma 1 ([ 6 ]) Let f he a Boolean function on n variables computable by a 

Boolean circuit of depth d and size M , and let 9 be any integer. Then 

E (/•(»))" 

\w\>9 
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4 The Classes and their Levels 

In this section we define classes of functions which generalize the notion of fc- 
halanced functions introduced in [1]. Let t be a positive integer, 0 < t < n. 

Definition 1 Bn^ is the class of Boolean functions depending on n variables 
that take the value 1 on exactly 2^“^ input strings. 

Making use of the notion of restriction (see Section 2), we organize the functions 
in each class into a sequence of levels according to the following definition: 

Definition 2 For any t < k < n, is the subset of Bn^ consisting of all 

functions f such that, for any restriction p of size k, fp G B^\ We call 
the k-th level of Bn"^ . 

In other words, consists of all functions /, of cardinality |/| = 2^“^, such 

that any subfunction fp depending on k variables has cardinality \fp \ = 2^“L 

We now state some basic properties of the hierarchy of levels Let t < k < 



- c 

- = bP. 

— The classes of /^-balanced functions defined in [1] correspond to the k-th 
levels of Bn ^ . 

— The parity function and its complement are the only two functions which 
belong to the first level of Bn \ i.e. to 

In Section 7 we will provide nontrivial examples of functions in any level of the 
class Bn^ . More precisely, we will prove that, for any t, is strictly contained 

in and that the levels are not empty, provided that k > 

All these proofs will make use of the spectral characterization of these functions, 
which we derive in the following section. 

5 Spectral Characterization of the Hierarchy of 
Functions 

We now derive a spectral characterization of the functions in any level of the 
class Bn^ . We denote by /q the zero-order Fourier coefficient. 

Theorem 2 A Boolean function / : {0, 1}^ ^ {0, 1} belongs to the k-th level of 
the class Bn^ if and only if the following two properties hold: 

(1) fS = 2-S- 

(2) for any string w such that 0 < \w\ < n — k, f*{w) = 0. 
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Proof. 



— If / G then property (1) holds by definition, because /q is equal to 

the probability that the function / takes the value 1. Thus, we only need to 
prove property (2). 

Let // = (/xi , // 2 , • • • , Pn) be a Boolean string such that 0 < |/i| = n—£ < n—k. 
Moreover, let [J = {i \ fii = 1}. For any string u G {0,1}^“^, let fu denote 
the subfunction defined by the restriction that assigns to the variables Xi 
such that i E U ^ the (n — £) values taken from the string w, and leaves 
undetermined the other £ variables. 

Then, we have 



ru‘) = 



X{w) 



= - E 

2n 

= ^ E [(-1)'“' 



(-1)'“' E /“(o 

v(^{0,lp 



For any u G {0, 1}”^^, the subfunction /„ depends on £ > k variables and, 
since / <E and c for any £> k, we have fu <E and 

\fu\ = 2^^*. Thus, we get 



/*(m) 



2^-4 

2" 



E 



(- 1 ) 1^1 



0. 



— We now prove that if properties (1) and (2) hold, then / G 

Let us choose (n — k) variables out of n, and let U be the set of the indices of 
these (n — k) variables. For any u G {0, 1}^“^, let ff denote the subfunction 
obtained from / by assigning to the variables in the set L, the (n — k) values 
taken from the string n, and leaving undetermined the other k variables. 
For any w, depends on k variables. We show that any such subfunction 
accepts exactly 2^“^ inputs, i.e. for any string n, \fu \ = 2^“b 
Let denote the vector of the cardinality of the 2^“^ subfunctions fu^ and 
let denote the vector of the Fourier coefficients related to the 2^“^ strings 
w = (tci, , Wn) such that Wi = 0 for any i ^ U . Note that all the 2^“^ 

coefficients in the vector are of order less or equal to n — k. Because of 
the recursive definition of Hadamard matrices, it turns out that 

/^ = L . 

From properties (1) and (2), and from the fact that the zero order Fourier 
coefficient is equal to the probability that the function takes the value 1, it 
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then follows 



from which 



1 



/i\ 

0 



\ 0 / 







fl\ 


cyn 


0 


= 2*^* 


1 


-p u ^ u c\k—t [_i 

/# = ^ ^n-k Ju = 2n-k Ju = ^ B^-k 


\0/ 


\1/ 





Thus, the theorem follows by repeating the same argument for all the (^) 
choices of the set U, 



6 A Lower Bound on the Size Complexity of 
Functions 

We are now able to state and prove our main result, stating that ^C^-circuits 
cannot compute functions in the k-th level of Bn \ whenever k = n — (logn)^^^^ 
and t = (logn)^^^^. 

We first make use of the spectral characterization derived in Theorem 2, 
together with Lemma 1, to determine a lower bound on the size required by a 
depth d circuit to compute functions in the A:-th level of Bn^ . Finally, an easy 
application of this bound will provide our thesis. Let t > 0. 

Theorem 3 Let f e a Boolean funetion depending on n variables, 

eomputahle by a eireuit of eonstant depth d and size M . Then 



Proof. An application of Lemma 1 yields the following inequality: 

M > 2 . 

\ w \>6 

Let us choose 0 = n — k. From the fact that f*{w) = 0 for any 0 < |tc| < n — k 
(see Theorem 2) it follows 

Iwlyn—k w:\w\^Q w 
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where /q denotes the zero-order Fourier coefficient. Then, by using the Parsevahs 
identity = /o = 8®^ 

E (rwt = 4-i22-<'+», 

\w\>n—k 



and the thesis immediately follows: 



M >2 



20 2 



'w\>n—k 



Notice how this result establishes a clear connection between complexity and 
combinatorial properties of Boolean functions. 

Our main result, stating that -circuits cannot compute functions in the 
k-th level of Bn\ whenever k = n — (logn)^^^\ and t = (logn)^^^\ follows 
immediately as a corollary of Theorem 3. 

Corollary 4 Any function f G superpolyno- 

mial size to be computed by a constant depth circuit. 

Proof. Easily follows from Theorem 3. ■ 

Note how the lower bound to the size can become exponential: 

Corollary 5 Constant depth circuits require exponential size to compute func- 
tions in levels whenever k is s.t. n — k = for any positive constant 

£ < 1, and t = (logn)^^^^ . 

Proof. Immediate from Theorem 3. ■ 



7 Properties of the Hierarchy 

In this section we provide nontrivial examples of functions in the levels of the 
hierarchy . More precisely, by applying the spectral characterization derived 
in Section 5, we prove that, for any t, \q strictly contained in ^nd 

that the sets are not empty, provided that k > 

Theorem 6 For any n, 0 if k > n T 1. 

Proof. By induction on t. 

Base 

For any n, and for t = 1, the parity function and its complement belong to 
and k^n = 0. 
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Induction step 

Let us suppose that, for any n, ^ 0 if k > n + 1. 

Let ^ be a Boolean function, depending on n — m variables, which belongs 
to the "deepest” level of That is, g G for k = (n — m) + 1 

(wd.o.g. let us assume that t divides n — m). 

We define /, depending on n variables, as follows: 

i'f Jo if \a\ is odd 

J[^p) 1^1 even , 

where a G {0,1}^ and /? G {0,1}^“^. First of all, note that / G ^ 

Indeed, |/| = 2^“^ \g\ = 2^“^ = 2^“^^+^). 

From the definition of /, and from the structure of Hadamard matrices, it 
turns out that the spectrum of / can be defined in terms of the spectrum of 
g and of the spectrum of the parity function, in the following way: 






Thus, since the parity function has the following Fourier spectrum 



we get 



p*{a) 



1/2 if |a|=0 

0 if 0 < |q;| < m 

— 1/2 if |q;| = m , 



f*{a(3) 



0 



if |q;| = 0 

if 0 < \a\ < m 

if |q;| = m . 



If we now use the fact that g G together with the spectral 

characterization of Theorem 2, we obtain that f*{w) = 0, whenever |tc| < 
min{ ^~^ , m}. 

Therefore, the thesis follows if we choose rn = Indeed, we have that 

/*(tc) = 0 whenever |tc| < and from Theorem 2 it follows that / G 

■^^n, which completes our induction. ■ 



Notice that, because of its construction, the function / defined in the proof of 
the above theorem is nondegenerated, i.e. it depends on all input variables. 

By defining / in a more complicated way it is possible, in some cases, to 
decrease the bound on k^ but only by a constant factor. Therefore, we conjecture 
that the bound on k given in Theorem 6 is not far from being asymptotically 
optimal. 

Theorem 6 is an interesting result for the following two reasons. First of all, 
it allows us to verify that the classes of functions under investigation are not 
empty, at least for a significant number of levels. Moreover, since for constant 
values of t, the functions in the deepest levels of the hierarchy, can be regarded as 
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“generalization” of the parity function, it is interesting to understand how “deep” 
we can go in such a generalization, i.e. how close the combinatorial structure of 
our functions is to the combinatorial structure of parity. 

We now prove that, for any t, \q strictly contained in provided 

that k > In other words, we prove that nontrivial examples of functions 

do exist for any level of . 

Theorem 7 If k > n, then ig strictly contained in , 

Proof. The proof of the theorem is strictly related to that of Theorem 6. For 
t = 1, is strictly contained in ^(lA+i) fQY k > 1 (see [1]). 

For the induction step, we construct a function exactly as we did in the proof 
of Theorem 6. Then, by choosing for rn any value between 1 and we get 

a function / such that / G b^t / ^ ^ ^ 

8 Conclusion 

Any attempt to find connections between mathematical properties and complex- 
ity has a strong relevance to the field of Complexity Theory. This is due to the 
lack of mathematical techniques to prove lower bounds for general models of 
computation. This work represents a step in this direction: we define combinato- 
rial properties that makes Boolean functions “hard” to compute and show how 
the Fourier transform could be used as a mathematical tool for the analysis of 
Boolean functions complexity. Further work to be done includes a deeper analysis 
of the structure of the levels in order to get an optimal lower bound on k^ 

and, more in general, a deeper analysis of the connections between combinatorial 
properties, spectral properties and complexity of Boolean functions. 

References 

1. A. Bernasconi. On the Complexity of balanced Boolean Functions. CIAC’97, Lec- 
ture Notes in Computer Science 1203 (1997) pp. 253-263. 340, 343, 348 

2. M. Furst, J. Saxe, M. Sipser. Parity, circuits, and the polynomial-time hierarchy . 
Math. Syst. Theory, Vol. 17 (1984) pp. 13-27. 339 

3. J. Hastad. Computational limitations for small depth circuits. Ph.D. Dissertation, 
MIT Press, Cambridge, Mass. (1986). 339, 341 

4. S.L. Hurst, D.M. Miller, J.C. Muzio. Spectral Method of Boolean Function 
Complexity. Electronics Letters, Vol. 18 (33) (1982) pp. 572-574. 

5. R. J. Lechner. Harmonic Analysis of Switching Functions. In Recent Development 
in Switching Theory, Academic Press (1971) pp. 122-229. 341 

6. N. Linial, Y. Mansour, N. Nisan. Constant Depth Circuits, Fourier Transform, 
and Learnability. Journal of the ACM, Vol. 40 (3) (1993) pp. 607-620. 340, 342 

7. H .U. Simon. A tight C(loglogn) bound on the time for parallel RAM’s to compute 
nondegenerate Boolean functions. FCT’83, Lecture Notes in Computer Science 158 
(1983). 

8. I. Wegener. The complexity of Boolean functions. Wiley- Teubner Series in Comp. 
Sci., New York - Stuttgart (1987). 



Eulerian Secret Key Exchange 

(Extended Abstract) 



Takaaki Mizuki^, Hiroki Shizuya^, and Takao Nishizekd 

^ Graduate School of Information Sciences, Tohoku University, 
Sendai 980-8579, Japan 
{mizuki ,nishi}@ecei .tohoku. ac . jp 
^ Education Center for Information Processing, Tohoku University, 
Sendai 980-8576, Japan 
shizuya@ecip.tohoku.ac.jp 



Abstract. Designing a protocol to exchange a secret key is one of the 
most fundamental subjects in cryptography. Using a random deal of 
cards, pairs of card players (agents) can share information-theoretically 
secure keys that are secret from an eavesdropper. In this paper we first in- 
troduce the notion of an Eulerian secret key exchange, in which the pairs 
of players sharing secret keys form an Eulerian circuit passing through 
all players. Along the Eulerian circuit any designated player can send a 
message to the rest of players and the message can be finally returned 
to the sender. Checking whether the returned message is the same as 
the original one, the sender can know whether the message circulation 
has been completed without any false alteration. We then give three ef- 
ficient protocols to realize such an Eulerian secret key exchange. Each of 
the three protocols is optimal in a sense. The first protocol requires the 
minimum number of cards under a natural assumption that the same 
number of cards are dealt to each player. The second requires the min- 
imum number of cards dealt to all players when one does not make the 
assumption. The third forms the shortest Eulerian circuit, and hence the 
time required to send the message to all players and acknowledge the 
secure receipt is minimum in this case. 



1 Introduction 

Suppose that there are k (> 2) players F2j - • • ^ Pk and a passive eavesdropper, 
Eve, whose computational power is unlimited. All players wish to agree on a 
one-bit message that is secret from Eve. Since Eve is computationally unlimited, 
the secret message must be information-theoretically secure. Let C be a set of 
d distinct cards which are numbered from 1 to d. All cards in C are randomly 
dealt to players F\^ F2j - • • ^ Fk and Eve. Let Q C C be Fi^ hand, and let Ce C C 
be Eve’s hand. We denote this deal by C = (Ci, C2, • • • , Ce). Clearly C is a 
partition of set C. We write Cp = \Cp\ for each p G {1,2, • • • and Ce = |Ce|. 
We say that 7 = (ci, C2, • • • , Ce) is the signature of deal C. The set C and 
the signature 7 are public to all the players and even to Eve, but the hands Q, 
1 < i < A:, and Ce are held exclusively by 1 < i < A:, and Eve, respectively, 
as in the case of usual card games. 
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Fischer, Paterson and Rackoff [1] give a protocol for two players to exchange 
a one-bit key which is secret from Eve. A player encrypts a one-bit message by 
adding (modulo 2) the message and the key, and sends it to the other player. 
The decryption is done in the same way at the recipient. This cryptographic 
communication is information-theoretically secure. On the other hand, Fischer 
and Wright [2,5] extends the protocol so that any number k of players can share 
a one-bit message. Their main idea is to form a spanning tree over the players. 
That is, they regard each player as a vertex of a graph G, and regard each pair of 
players sharing a one-bit key as an edge of G. Their protocol is designed so that 
G becomes a spanning tree whenever the protocol terminates normally. Along 
the tree a one-bit message can be spread over the players. 

In this paper, we first introduce the notion of an Eulerian secret key exchange 
by which the graph G becomes an Eulerian circuit. See for example Figure 1(a) 
and Figure 2. Refer to [6,7] for the graph-theoretic terminology. Along the Eu- 
lerian circuit any designated player can send a one-bit message to the rest of 
players using the keys shared among the players, and the message can be finally 
returned to the sender. Checking whether the returned message is the same as 
the original one, the sender can know whether the message circulation has been 
probably completed without any false alteration. Thus the secure receipt of the 
message to all players can be acknowledged. This acknowledgment is practically 
important when a computer network is unreliable, possibly due to traffic jam, 
disconnection, or error by some malfunction. We then give three efficient proto- 
cols to realize the Eulerian secret key exchange, using graph-theoretic techniques. 
Each of the three protocols is optimal in a sense among the class of all “key set 
protocols.” The first protocol requires the minimum number of cards under a 
natural assumption that the same number of cards are dealt to each player, that 
is. Cl = C 2 = • • • = Cj^. The second requires the minimum number of cards when 
Cl = C 2 = • • • = Ck does not always hold. The third forms the shortest Eulerian 
circuit, that is, the Eulerian graph G has the fewest edges, and hence the time 
required to send the message to all players along the circuit and acknowledge 
the secure receipt is minimum. 

2 Preliminaries 

In this section we first briefly review the results and techniques given in 
[1,2, 3, 4, 5, 8], and then define some terms. 

The scenario is the same as before: there are k {> 2) players and Eve, and 
the players wish to exchange a one-bit key. The deal of cards is expressed as 
C = (Cl, C 2 , • • • , Cj^; Ce). A key set K = {x^y} consists of two cards x and y, 
one in C^, the other in Gj with i ^ j ^ say x ^ Gi and y G Gj. We say that a key set 
K = {x, y} is opaque if 1 < i, j and Eve cannot determine whether x G C^ or 
X G Gj with probability greater than Note that both players Pi and Pj know 
that X G C^ and y £ Gj. If K is an opaque key set. Pi and Pj can share a secret 
key r G {0,1}, using the following rule agreed on before starting the protocol: 
r = 0ifx>y;r=l, otherwise. Since Eve cannot determine whether r = 0 
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or r = 1 with probability greater than the key r is information-theoretically 
secure. 

The players can obtain opaque key sets by executing the following protocol 
[1 ,2,5] , where , C2, • • • , Ck^ Ce denote the current hands of , T2r * * ? A:, Tve, 

respectively. Each player is allowed to discard a card x if necessary. Discarding 
X means all the k players agree that x has been removed from someone’s hand, 
that is, X ^ (ULi O) U Ce- 

1. Choose a proposer Fg such that 7 ^ 0 and 1 < s < A: by a certain procedure. 

2. The proposer Fg determines in mind two cards x^y. The cards are randomly 
picked so that x E Cg and y ^ Cq. Then Fq proposes K = {x,y} as a key 
set to all the players. (The key set is proposed just as a set. Actually it is 
sorted in some order, for example in ascending order, so Eve learns nothing 
about which card belongs to Cq unless Eve holds y.) 

3. If there exists a player Ft holding y. Ft accepts K. (Since K is an opaque 
key set, Fg and Ft can exchange a one- bit key which is secret from Eve.) 
The players discard both x and y. Eurther, either Fg or Ft holding the fewer 
cards discards all her cards and drops out of the protocol. Return to Step 1. 

4. If there exists no player holding y, that is. Eve holds y, then the players 
discard both x and y, and return to Step 1. 

These steps 1-4 are repeated until at most one player remains in the protocol. 

The protocol above makes the graph G form a spanning tree over the players 
under a certain condition. All the players are the vertices of G, and all the pairs 
of players sharing keys are the edges of (T. A player Ft can encrypt a one- bit 
message for each Fj of the players adjacent to Ft in G by adding (modulo 2) the 
message to the key shared by Ft and Fj^ and send the encrypted message to Fj. 
The receiver Fj decrypts it by adding it to the key (mod 2) , and unless she is a leaf 
of the tree, she encrypts the message and sends it to each adjacent player except 
Fi in the same way using the key shared by her and her adjacent player. Thus 
any designated player can transmit a one-bit message to the rest of players along 
the tree with information-theoretic security. Eischer and Wright [2,5] give the 
so-called SEP procedure for the step 1. They show that if 7 = (ci, C2, • • • , Ce) 
satisfies Ci > 1 for every i, 1 < i < and maxc^ T min q > A: T Ce, then graph 
G always becomes a spanning tree. 

We then define a key exchange graph, an Eulerian key exchange, and a key 
set protocol, as follows. 

Definition 1 A key exchange graph G = {V,F) is a multigraph with vertex 
set V and edge set E such that V = {1, 2, • • • , A:} and, for any vertices i and j 
in V , E contains rn multiple edges joining i and j if and only if players Ft and 
Fj share m one-hit keys, where m>0. 

Erom now on we often identify player Ft, 1 < i < k, with vertex i E V m the 
graph G. 
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Definition 2 A key exchange is called an Eulerian key exchange if the key 
exchange graph G is an Eulerian graph, that is, G has an Eulerian circuit passing 
through every edge exactly once. [See for example Eigure 1(a).) 

We consider the class of key set protocols, obtained by generalizing the pro- 
tocol above. 

Key Set Protocol 

1. Choose a proposer Eg such that Gq ^ % and 1 < s < /^ by a certain procedure 

(say “Procedure A”). 

2. The proposer Fq determines in mind two cards x and y. The cards are 
randomly picked so that x ^ Gq and y ^ Gq. Then Fq proposes K = {x,y} 
as a key set to all the players. 

3. If there exists a player Ft holding y, Ft accepts K . Both x and y are discarded. 

4. If there exists no player holding y, that is, Eve holds y, then both x and y 
are discarded. 

5. Determine a subset X of players by a certain procedure (say “Procedure 
B”). Each player in X discards her whole cards. Every player holding no 
card drops out of the protocol. Return to Step 1. 

The steps 1-5 above are repeated until at most one player remains in the 
protocol. 

At step 3 Fg and Ft succeed in sharing a one-bit key, an edge [s,t) joining 
vertices s and t is added to graph G, and the number of cards held by each of 
Fs and Ft decreases by one. At step 4 the numbers of cards held by Fg and Eve 
decrease by one, but no new edge is added to G. 

The procedure A determines a player to propose a key set, and the procedure 
B determines the players to be forced to drop out of the protocol. By considering 
different procedures A and B, we obtain the class of key set protocols. The key 
set protocol that results from procedures A and B is called an [ A, B) -protocol. 
We assume that procedures A and B can take only the signature, the sizes of 
current hands, and the current key exchange graph as the input data. 

The protocol of [2,5] mentioned above is a kind of key set protocol. Its pro- 
cedure A, i.e. the SEP procedure, chooses a proposer s as follows: if there is a 
vertex i G E such that |Q| = 1, \Gq \ = 0 and \Gj \ > 2 for all vertices j ^ i, then 
the SEP procedure chooses the vertex i as a proposer s; if there is no such i but 
there is a vertex j with \Gj\ > 2, then the procedure chooses as s the vertex j 
having the minimum \Gj\ among all these vertices j; otherwise, the procedure 
chooses any vertex as s. Its procedure B determines a set X as follows: 

( {s} if y eGt and \Gs\ < |Q|; 

A = < {t} if y eGt and jcj > |q|; 

[0 ifyeGe. 

Thus, when y e Gt, either Fg or Ft holding the fewer cards drops out of the 
protocol. 
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Since we wish to make G an Eulerian circuit instead of a spanning tree, we 
need procedures A and B different from these procedures. 

Definition 3 A key set protoeol achieves an Eulerian key exehange for a sig- 
nature 7 = (ci,C 2 ; •••,Ck]Ce) if the key exehange graph G always beeomes an 
Eulerian graph for any deal C having the signature 7 when the players follow the 
protoeol. 

3 Protocol for Eulerian Key Exchange 

In this section we give the first protocol, the (Ai,Bi)-protocol, of our three key 
set protocols. The (Ai ,Bi)-protocol is optimal among the class of all key set pro- 
tocols achieving an Eulerian key exchange for a signature 7 = (c 2 , C 2 , • • • , Ce) 
with Cl = C 2 = • • • = Ck in the sense that the number of required cards is 
minimum. 

We give procedures Ai and Bi only for A: > 4, but omit those for k = 
since they are similar to those for k > A. 

The smaller the degrees of vertices in a key exchange graph G are, the fewer 
the cards are required. We thus want to make the degree of each vertex of 
G as small as possible. However, we cannot always make every vertex have 
degree exactly two, that is, we cannot always form a Hamiltonian graph. We 
therefore make the degree of each vertex either two or four. This is our idea 
behind the (Ai,Bi)-protocol. Note that a graph is Eulerian if and only if the 
graph is connected and every vertex has even degree [ 6 ]. 

We denote by d[i) the degree of vertex i G E in graph G = E). A vertex 

i E V having a non-empty current hand Q 0 is called a white vertex^ and a 
vertex i E V having an empty current hand Q = 0 is called a blaek vertex. White 
and black vertices are drawn respectively by white and black circles in all figures 
in the paper. A white vertex corresponds to a player remaining in a protocol, 
and a black vertex corresponds to a player who has dropped out of the protocol. 
Let W be the set of all white vertices. Eigure 1 illustrates connected components 
of an intermediate graph G. During any execution of the protocol, the degree of 
each vertex of G is at most four; in particular, the degree of each black vertex 
is either two or four; each connected component of an intermediate graph G 
has one or two white vertices; if a component of an intermediate graph G has 
exactly one white vertex as illustrated in Eigures 1 (a) and (c), then the degree 
of this vertex is zero or two; a final Eulerian graph G has at most one white 
vertex, the degree of which is two or four; on the other hand, if a component 
of an intermediate graph G has two white vertices as in Eigures 1 (b), (d) and 
(e), then the degrees of these vertices are odd, namely one or three. Note that 
every connected component of any graph has an even number of vertices of odd 
degree [ 6 ]. 

Before the (Ai,Bi) -protocol terminates, the degree d(i) of each white vertex 
i is either 0, 1, 2, or 3. We partition the set W of all white vertices i in an 
intermediate graph G into three subsets Wi, W 2 and W 3 , depending on the 
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Fig. 1. Connected components of graph G during an execution of the protocol. 

degrees d{i) and d[j) where j is the other white vertex, if any, in the connected 
component containing i, as follows: 

VKi = G W \ [d{i) = 2] V [d{i) = 3 and d{j) = 1]}; 

W 2 = G W \ [d{i) = 0] V [d{i) = 1 and d{j) = 1]}; and 

14/3 = VK - VKi - 14/2. 

Note that 

Ws = G W \ [d{i) = 1 and d{j) = 3] V [d{i) = 3 and d{j) = 3]}. 

The white vertex of degree 2 in Figure 1(a) and the white vertex of degree 3 in 
Figure 1(b) are in Wi. The white vertices in Figures 1(c) and (d) are in W 2 . The 
white vertex of degree 1 in Figure 1(b) and the white vertices in Figure 1(e) are 
in W 3 . 

Our procedure Ai for step 1 chooses as a proposer s a vertex in W with the 
priority order of IFi, W 2 and IF 3 in this order; we will explain the reason after 
giving procedures Ai and Bi. Furthermore we keep every vertex i E W satisfying 
\Ci \ T d[i) > 4. It should be noted that |Q| T d[i) > 4 at the beginning of the 
protocol if Ci > |~Ce/2] +4, and that if |Q| + d(i) < 3 then d[i) could not become 
four in the final graph G, 

Procedure Ai: Choose a proposer s G IF as follows. 

Case Al: Ge ^ ^ and there exists i G IFi U IF 2 sueh that |Q| + d{i) > 5. 

If there exists a vertex i G IFi such that |Q| + d[i) > 5 (Figures l(a),(b)), 
choose any of these vertices as a proposer s. Otherwise, choose as s any i G IF 2 
such that \Gi \ T d{i) > 5 (Figures 1(c), (d)). 

Case A2: Ge = 0 and IFi U IF 2 7 ^ 0. 

If IFi 7 ^ 0, choose any vertex in IFi as a proposer s (Figures l(a),(b)). If 
IFi = 0, choose as s any vertex in IF 2 (Figures l(c),(d)). 

Case A3: Otherwise. 

(We can show that in this case (C is a connected graph with |IF| = 2 and 
IF 3 7 ^ 0 like in Figure 1(b) or 1(e), and hence the protocol terminates when the 
two vertices i and j in IF are joined by an edge.) Choose as s the vertex i in IF 
such that \Ci\ > \Cj\. 

An edge in a graph is called a bridge if the graph has no cycle passing through 
the edge [ 6 ] . If vertices s and t were contained in different connected components 
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Fig. 2. A generating process of an Eulerian graph containing the graph in Figure 
1 (e). 



before edge (s, t) is added, then the edge is a bridge in a new graph after (s, t) is 
added. On the other hand, if both s and t were in the same connected component, 
then the edge is not a bridge in a new graph. Our procedure Bi for step 5 is 
simple, as follows. 

Procedure Bi: Choose set X as follows. 

{i I [i — s or t] A [d[i) is even]} 

^ if [y E Cf] A [edge (s,t) is a bridge in O']; 

I {s| if [y G Cf] A [edge (s,t) is not a bridge in O']; 

,0 if y € Ce. 

The degree of s or t changes only when y E Ct. If the degree of s or t becomes 
an even positive number, then the vertex is forced to drop out of the protocol. If 
(s, t) is not a bridge in (T, then the degrees of both s and t become even positive 
numbers in G but we make only s drop out of the protocol by setting A = {s} 
as above, because, otherwise, the component containing s and t would have no 
white vertices and G could not become connected. 

Thus we have given procedures Ai and Bi. Figure 2 illustrates how the 
(Ai ,Bi) -protocol generates an Eulerian graph. 

We are now ready to explain the reason why we decide the procedure Ai as 
above. 

If s was contained in IVi U IV2 and (s,t) is not a bridge in a new graph (C, 
then d(t) = 2 in G. On the other hand, if s were contained in Ws and (s,t) is not 
a bridge in (T, then s would become a black vertex and t would remain a white 
vertex in (C, due to procedure Bi, but the white vertex t would have degree 4 
although the protocol may have not terminated. Thus we give priority to white 
vertices in Wi U W2 than those in IT 3 when we choose s. 

If G had two or more connected components having two white vertices of 
degree 3 like in Figure 1(e) and the two white vertices of one of the components 
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were joined by an edge, then one of the two white vertices would become a white 
vertex of degree 4. Therefore we do not want to produce two or more connected 
components like in Figure 1(e). Whenever such a component is produced, there 
must have existed two or more vertices in IFi. Therefore we do not wish to 
increase the vertices in IFi. If a vertex in IV 2 were chosen as s although IVi 7 ^ 0, 
then the vertices in Wi would increase. Thus we put the priority to vertices in 
IVi than those in IV 2 when we choose s. 

We first have a sufficient condition for the (Ai ,Bi)-protocol to achieve an 
Eulerian key exchange, as follows. 



Theorem 1 . The -protocol achieves an Eulerian key exchange for a sig- 

nature 7 = (ci,C 2 ; • • • ,Ck]Ce) if 



(a) k 

(b) k 



2; Cl , C 2 > 2, and ci + C 2 > Ce + 4; 

3; and every vertex i G F satisfies Ci > 



Ce + 2 if Ce < 1] 
\ce/2] + 3 i/ Ce > 2; 



or 



(c) k > A, and every vertex i eV satisfies Ci > |~Ce/2] + 4. 

Proof, omitted in this extended abstract. 

We next give a lower bound on the number of cards required for a key set 
protocol to achieve an Eulerian key exchange. 

If a key set protocol achieves an Eulerian key exchange for signature 7 = 
(ci, C 2 , • • • , Cj^; Ce), then the key exchange graph G must become an Eulerian 
graph at the end of any run of the randomized protocol for any deal C = 
(Cl, C 2 , • • • , Cj^; Ce) having signature 7. Hence, whoever has the card y in the 
proposed key set K = {x,y}, C should become an Eulerian graph. Considering 
a malicious adversary to choose y so that the protocol needs cards as many 
as possible, we have lower bounds on the number of cards as in the following 
Lemma 1. 

Lemma 1 If there exists a key set protocol achieving an Eulerian key exchange 
for 7 = (ci,C2;* • •; Ck'.Ce), then 

(a) k = 2, Cl , C 2 > 2; and ci + C 2 > Ce + 4; 

(b-i) k = 3, Ce < 1, and either Ci > Ce E 2 for every vertex i e V or there 
exists a pair of vertices i,j ^ F with c^ + Cj > Ce + 6 ; 

(b-ii) k = 3, Ce>2, and there exists a pair of vertices i,j ^ Ci + Cj > 

Ce T 6 ; 



or 



(c) k > 4; and either there exists a vertex i e V with c^ > Ce + 4 or there 
exists a pair of vertices i,j ^ F with c^ + Cj > Ce + 8 . 

From Theorem 1 and Lemma 1 we immediately have Theorem 2 on a nec- 
essary and sufficient condition for a key set protocol to achieve an Eulerian key 
exchange when ci = C 2 = • • • = Cj^. 



Eulerian Secret Key Exchange 357 



Theorem 2. Let 7 = (ci, C2, • • • , Ce) he any signature sueh that ci = C 2 = 
• • • = Ck. Then there exists a key set protocol achieving an Eulerian key exchange 
for 7 if and only if 



(a) k 

(b) k 



2, and ci 

3, and Cl 



— r^e/2] + 2; 

^ fce + 2 if Ce < 

- { [Ce/21 + 3 i/ Ce > 



1 ; 

2 ; 



or 



(c) k > and ci > |~Ce/2] + 4, 



Thus the (Ai,Bi)-protocol is optimal among the class of all key set protocols 
achieving an Eulerian key exchange for a signature 7 = (c 2 , C 2 , • • • , Cj^; Ce) with 
Cl = C 2 = • • • = Ck in the sense that the number kci of required cards of players 



kci 



" 2 |~Ce/2] + 4 if A: = 
3ce + 6 if A: = 
3 |~Ce/2] + 9 if A: = 
^ k |~Ce/2] T 4A: if A: > 



2 ; 

3 and Ce < 1; 

3 and Ce > 2; 

4 



is minimum. Of course, the total number d = kc\ T Ce of cards dealt to the 
players and Eve is minimized by the (Ai,Bi)-protocol for any Ce, too. 



4 Protocol Using the Minimum Number of Cards 

In this section we give the second protocol, the (A 2 ,B 2 )-protocol, which is opti- 
mal in the sense that the total number of cards dealt to the players, 
minimum when ci = C 2 = * * * = cp does not always hold. 

Eor the case A: = 2, Theorem 1 and Lemma 1 imply that the (Ai,Bi)-protocol 
in the preceding section is optimal and the minimum number of cards is ci + C 2 = 

Ce+4. We hereafter assume without loss of generality that k > 3 and ci = max Cp, 

pev 

Our procedures A 2 and B 2 for the case Ce > 1 are very simple as follows. 
Procedure A 2 : Choose always vertex 1 as a proposer s. 

Procedure B 2 : Choose X as follows, 
r {t} if d{t) = 2; 

[ 0 if d[t) < 1. 

Eor the case Ce > 1, an final Eulerian graph G has a pair of multiple edges 
joining vertex 1 and each of the other A: — 1 vertices. Thus (T is a so-called double 
star, and is an Eulerian graph. Although we omit the procedures A 2 and B 2 for 
the case Ce = 0, the (A 2 ,B 2 )-protocol produces an Eulerian graph G in which 
vertex 1 and two other vertices induce a triangle, and vertex 1 is joined to each 
of the other A: — 3 vertices by a pair of multiple edges. 

Then the following theorem holds on the (A 2 ,B 2 )-protocol. 
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Theorem 3 . Let k > ‘S. The { A2,B 2) -pro toeol aehieves an Eulerian key ex- 
change for 7 = (ci, C2, • • • , c^; Ce) if 



(a) Ce > 1 and 



Cl > 2k -\- Ce — 2 , 

CpA2 for each p, 2 < p < k] 



or 



(b) Ce 



r Cl > 2A: — 4, 

\cp > 2 for each p, 2 < p < k. 



Moreover, the [A2,B2)-protoeol is optimal among the elass of key set protoeols 
aehieving an Eulerian key exehange for a signature in the sense that the number 
of required eards of players 



k 



p=i 



4/^ + Ce - 4i/ Ce > 1; 
4/^ — 6 if Ce = 0 



is minimum. 

Thus the number of cards required by the (A2,B2)-protocol is much smaller 
than that by the (Ai,Bi)-protocol especially when Ce is large. 

One can easily observe that the (A2,B2)-protocol achieves an Eulerian key 
exchange for any signature 7 = (ci, C2, • • • , Cj^; Ce) satisfying (a) or (b) in Theo- 
rem 3 . Thus it is easy to prove the first proposition in Theorem 3 . 

5 Protocol for Shortest Eulerian Circuit 

In this section we give the third protocol, the (Ai^,Bi)-protocol, which is optimal 
in the sense that the length of the Eulerian circuit is minimum. The length 
of the Eulerian circuit corresponds to the time needed for the circulation of 
a message with the acknowledgment of secure receipt. Denote the length by 
1{G), then 1{G) = ^Ylp=i ^{p) si^ice 1{G) is equal to the number of edges in 
the Eulerian key exchange graph G. We define the acknowledgment time for a 
signature 7 = (ci, C2, • • • , Cj^; Ce) as follows. 

Definition 4 We eallT{A,B;j) = maxmax/(G'c) the acknowledgment time 

C Gc 

of an {A,B) -protoeol for a signature j, where C runs over all deals having signa- 
ture j, and Gc runs over all key exehange graphs formed by any exeeution of the 
protoeol for the deal C. If the protoeol does not aehieve an Eulerian key exehange 
for 7; we define T{A,B;j) = 00. 

We slightly modify the procedure Ai in Section 3 so that, whenever a vertex 
in W2 is chosen as a proposer s, an isolated vertex in W2 is chosen as s with 
priority over non-iso lated vertices in W2. We denote this modified procedure by 
Al^ The procedure Ai^ is as follows. 
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Procedure Ai^: Choose a proposer s G VE as follows. 

Case AE: Ce 7^ 0 and there exists i e Wi U W 2 sueh that |Q| + d{i) > 5. 

If there exists a vertex i E Wi such that |Q| + d[i) > 5, choose any of these 
vertices as a proposer s. Otherwise, but if there exists an isolated vertex i G W2 
such that \Ci\ d[i) > 5, then choose any of these vertices as s. Otherwise, 
choose as s any vertex i G W 2 such that |Q| + d{i) > 5. 

Case A2h Ce = 0 and VEi U H/2 7^ 0- 

If VEi 7^ 0, choose any vertex in Wi as s. If VEi = 0 and there exists an 
isolated vertex i G IE2, choose as s any of these vertices. Otherwise, choose as s 
any vertex in W 2 - 
Case A3b Otherwise, 

In this case W = {Ej}. Choose as s the vertex i in IE such that |Q| > \Cj\, 



We can prove the following theorem, showing that the (Ai^,Bi)-protocol pro- 
duces an Eulerian graph G in which at most half the vertices have degree four 
and all the remaining vertices have degree two. 



Theorem 4. Let k > A, and let j = (ci, C2, • • • , Ce). Then 



^i’(A/,Bi; 




[|A;J if Tce/2] + 4 < Cp for all p e V- 

\_^k\ — 1 i/ Ce + 4 < Cp for all p E V. 



Considering a malicious adversary producing at least [k/2\ edge-disjoint cy- 
cles in O', we have the following theorem on a lower bound on the acknowledg- 
ment time of a key set protocol. 



Theorem 5. Let k > A, let an {A,B) -protoeol be any key set protoeol, and let 
7 = (ci , C2 , • • • , Cjt ; Ce) be any signature. Then 







- 1 . 



In particular, i/ Cp < Ce + 3 for all p <eV, then 







Theorems 1(c), 2(c), 4 and 5 immediately imply the following Theorem 6. 

Theorem 6. Let k > A. For any signature 7 = (ci, C2, • • • , Cj^; Ce) satisfying 
Cl = C 2 = ••• = Ck, the {Ai ,Bi)-protoeol is optimal among the elass of key set 
protoeols in the sense that the aeknowledgment time 

( Lf^J - 1 V Ce + 4 < ci; 

7'(A/,Bi; 7) = < [|A;J V [ce/S] + 4 < ci < Cg + 3; 

[ oo if Cl < 7/21 + 3 

is minimum. Ifj does not always satisfy ci = C 2 = • • • = Cj^ but satisfies |~Ce/2] + 
4 < Cp for every vertex p e V , then the {Ai ,Bi)-protoeol is nearly optimal in 
the sense that T{Ai\Bi;j) does not exeeed the minimum aeknowledgment time 
more than one. 
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Since procedure Ai^ is a type of procedure Ai, the (Ai^,Bi) -protocol not 
only forms the shortest Eulerian cycle, but also requires the minimum number 
of cards when ci = C 2 = * * * = 

One can observe that the acknowledgment time of the (Ai,Bi)-protocol sat- 
isfies i’(Ai, Bi; 7) = 2/^ — 2 for a signature 7 for which the protocol achieves an 
Eulerian key exchange, as follows. The protocol may produce a graph G ol2k — 2 
edges in which exactly two vertices have degree 2 and all the other vertices have 
degree 4. Therefore 7'(Ai,Bi;7) >2k — 2. Any Eulerian graph G produced by 
the (Ai,Bi) -protocol has at most 2k — 2 edges, and hence 'i'(Ai, Bi; 7) < 2k — 2, 
Note that all vertices in G have degree 2 or 4, and that two or more vertices 
including the first black vertex have degree 2 in G, 

On the other hand, the acknowledgment time of the (A 2 ,B 2 )-protocol is 



T{A2,B2;j) 



2k — 2 if Ce > 1] 

2/^ — 3 if Ce = 0. 



Since 7'(Ai^, Bi; 7) < the (Ai^,Bi)-protocol produces an Eulerian cir- 

cuit much shorter than those by the (Ai,Bi)- and (A 2 ,B 2 )-protocols, and than 
an Eulerian circuit passing through every edge of a spanning tree twice. 
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Abstract. In two-party secure computation^ a pair of mutually- 
distrusting and potentially malicious parties attempt to evaluate a func- 
tion f{x,y) of private inputs x and y, held respectively by each, with- 
out revealing anything but f(x,y) and without involving a trusted third 
party. This goal has been achieved with varying degrees of generality 
and efficiency using a variety of primitives, including combined oblivious 
transfer (OT) [GMW87], abstract oblivious transfer [K88], and commit- 
ted oblivious transfer [CTG95]. 

This work introduces the concept of a two-party one-time table (OTT), 
a novel primitive that is theoretically equivalent to precomputed OT. 
The OTT is tailored to support field computations rather than single- 
bit logical operations, thereby streamlining higher-level computations, 
particularly where information-theoretic security is demanded. 

The two-party one-time table is also motivated by the ease with 
which it can be constructed using simple resources provided by one or 
more partly- trusted external servers. This commodity-based approach 
strengthens overall security by ensuring that information flows strictly 
from servers to Alice and Bob, removing the need to trust third parties 
with the sensitive data itself. 



1 lotrodoction 

Two-party secure computation is a process by which two parties simulate a 
trusted but nonexistent mediator who helps them to compute a function f{x,y) 
of private inputs held by each. This virtual mediator accepts x from Alice and y 
from Bob, and returns f(x,y) to each, without revealing any intermediate results 
of the computation. Naturally, the goal is to achieve this end result with equal 
security, even when no such party is ready and willing to help. 

Several successful designs for such protocols have been developed, including 
work by Yao [Yao82a, Yao82b], Goldreich, Micali, and Wigderson [GMW87], 
Chaum, Damgrd, and van de Graaf [CDG87], Kilian [K88], and Crepeau, Tapp 
and van de Graaf [CTG95], among others. These approaches span quite differ- 
ent fault models, ranging from computationally-bounded to infinitely-powerful 
attackers, and from static to adaptive security. 

The work presented here addresses the information-theoretic domain, where 
perfect or statistical security is demanded, and adversaries have infinite com- 
puting resources. In this setting, Kilian ^s work is the pioneering achievement: 
two-party computation with statistical security against infinitely-powerful at- 
tackers is possible, using Oblivious Transfer as a primitive [K88]. (Oblivious 

Wen-Lian Hsu, Ming-Yang Kao (Eds.): COCOON‘98, LNCS 1449, pp. 361-370, 1998. 
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Transfer is a process by which Alice transmits bit b to Bob; it arrives with 50-50 
probability; Alice does not know whether it arrived, but Bob does [RabSl].) 

The protocol presented in [K88] certainly requires only polynomial time, but 
its complexity is high in practical terms. Recent results by Crepeau, Tapp and 
van de Graaf have reduced the complexity greatly [CTG95]. In both of these 
works, the fundamental steps are bitwise logical operations, built ultimately 
from oblivious transfers (including “committed” and “one-out-of-two” versions). 

Our goal is to seek algorithmic speed-ups and simplifications by supporting 
field operations rather than AND^s and EXOR^s. Using general-purpose OT- 
based results, a direct multiplication of two m-bit elements involves a circuit of 
size m?; each gate requires a subprotocol invoking 0(k^) OT^s [CTG95]; thus 
communication complexity is on the order of 0{m^k^). The protocols presented 
here, however, require only O(fem) message bits. Moreover, a direct multiplica- 
tion requires J?(m) rounds of communication (or at least logm with more clever 
circuit design) using OT-based methods, whereas our protocol requires 1 round. 

In part, we bring into question the centrality of OT and describe computa- 
tions that are simpler to understand and to achieve using equally “arbitrary” 
(but certainly somewhat more complicated) primitives. These primitives are op- 
timized for particular operations as well as for easy production by an underlying 
channel or third-party. 

Which Tools to Use? The ubiquitous applicability of OT has led to an over- 
whelming amount of effort toward finding methods to implement it efficiently 
and securely. 

But the engineer's central question of how to achieve OT overshadows a 
deeper one: if we employ significantly complicated mechanisms to achieve OT 
(in order to thereby enable cryptographic protocols), then can these mechanisms 
and their assumptions be used in a simpler or alternative fashion to power more 
suitable primitives? 

We attempt to simplify the higher-level specification of a two-party task 
by supplying optimized arithmetic operations, rather than bitwise logical ones. 
Naturally, these results do not achieve anything more general than what OT 
already enables. But they achieve it faster and in a clearer, more direct manner, 
where field computations are involved. 

Third-Party Assistance. Oblivious transfer is often treated as an underlying com- 
munication channel with specialized properties (such as guaranteed and measur- 
able noise). A slightly different twist is to regard it as a service provided by a 
third-party, M. 

From the view of trust management, it seems almost circular to use OT to 
help Alice and Bob compute /(x, y): if M is trusted to implement OT on sensitive 
bits 5, why doesnft M just accept x and y and compute /(x, y) directly? In either 
case, M sees both secret inputs. 

Through precomputed OT [B95], however, the distinction becomes clearer. 
By executing OT in advance on bits that are unrelated to their desired input bits, 
Alice and Bob can achieve the later computation of /(x,y) without requiring 
on-line assistance from M. Equally importantly, they achieve their computation 
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without ever transmitting sensitive information to M. Thus, the service provider 
(or “channel”) M is trusted in a significantly less extensive manner. 

One Time Tables, Assuming, then, that a trusted third party is willing to give 
or sell its assistance to Alice and Bob in an initial and relatively passive manner, 
is OT really the product they wish to purchase? 

The two-party one-time table (OTT) is an appealing alternative. The OTT 
is theoretically equivalent to OT (in terms of privacy) but specially tailored for 
expressing computations in arithmetic terms. It simplifies Aliceas and Bob^s in- 
teraction by supporting direct, field- based computation, by avoiding extensive 
gate- based cut-and- choose operations, and by obviating intricate commitment 
protocols. The product is simple for the trusted party to generate, as well, re- 
quiring only random number generation and field-element computations. 

In addition to simplifying secure computations, the OTT is designed to sup- 
port off-line assistance - where, namely, the trusted service provider helps out 
initially but departs without learning (or needing to learn) Aliceas and Bob^s 
desired computation. This simplifies trust management for Alice and Bob, who 
need rely less extensively on the integrity of the server (or underlying channel). 

Although we do not give details here, it is also possible to construct a robust 
OTT from resources provided by more than one server, even though one or more 
of the servers may be passively or actively faulty. Because Alice and Bob can 
seek out and choose among various, competing, cryptographic service providers, 
our solution matches well with the commodity-based approach for server-assisted 
cryptography proposed in [B97]. 

Results, Our main result is the following: 

Theorem 1. Let Alice and Bob have access to a two-party one-time table. There 
exists a statistically-secure two-party protocol for oblivious circuit evaluation that 
uses at most one message round per gate level and supports field operations 
including linear combinations^ multiplication^ and division. 

Properties and Comparisons, The goals sought and achieved in this paper differ 
from earlier approaches in several ways. 

— Computational vs. Information- Theoretic Security, Speedups due to homo- 
morphisms and various number-theoretic properties are inapplicable here, as 
they do not provide information theoretic security. 

— Lazy Verification, Many earlier approaches use zero-knowledge proofs, cut- 
and-choose, and other methods to ensure correctness at each stage. Our 
OTT construction allows verification to be left until the final stage, thereby 
providing faster and simpler algorithms. 

— Commodity-Based Cryptography, This work shares a basic motivation be- 
hind commodity-based cryptography [B97]: enlist third-party assistance to 
achieve a secure, joint computation. [B97] provided support for achieving 
OT using multiple servers. Here, we seek to replace OT and we address the 
case of only one server. While the methods presented here can be extended 
to rely on multiple servers, they are nonetheless distinct (whether or not 
extended to multiple servers). 
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2 Two-Party One-Time Tables 

Let fc be a security parameter, let t = 2fc + 2, let iV = 3t+ 1, and let F be a field 
of size exceeding iV + 1. (If an arbitrary field is desired, an extension is suitable.) 
Let i? = {^ 1 , . . . , tiv} be the roots of codewords for a BCH code over F [ML85]. 

We generally follow a notational convention that superscripts describe the 
party that holds a value; e,g, is a string held by Alice. 

Let (CommA^(x),CommA^(x)) be a committal by A to value x. That is, 
CommA^(x) is a string held privately by A, and CommA^(x) is a string held pri- 
vately by B and which reveals nothing about x. A can unveil x by conveying 
CommA^(x), while B can verify this unveiling by using CommA^(x). (Further de- 
tails are described later.) 

2.1 Wires 

A wire is a set of committed values known to each party, respectively. A wire 
exists in one of two states: prepared, and dedieated. 

A prepared wire represents some random field element, a. Alice holds the 
values Bob holds {<^f (a)}i=i..iv* (Unless otherwise noted, these 

sets will implicitly span i = 1..N.) These values are selected uniformly at ran- 
dom, constrained by /(^i) = + 4>f{o) where /(-) is a (uniformly random) 

polynomial of degree t = 2fc+2, satisfying /(O) = a. Although there are only two 
parties involved, this construction is similar to Shamir^s secret sharing [Sha79] 
combined with sum-sharing of the pieces. 

(As mentioned, all values are committed. That is, for each i, Alice holds 
Qommk^ {(j)f (a)) while Bob holds Qommk^ {(j)f (a)) . Bob is likewise committed.) 

A dedicated wire adjusts the represented value to some particular element. 
That is, a dedicated wire is the combination of the prepared wire with a public 
correetion Aa to the (secret) represented value, a: 

^p^{x) = (Aa,{(pf{a))}) (Alice) 

Ip^ix) = (Aa,{(f)fia))}) (Bob). 

Here, the value represented by the wire is x = a + Aa. Again, although there are 
only two parties involved here, the construction is inspired by Beaver^s circuit 
randomization technique [B91]. 

We refer to the value a on a dedicated wire w as the core value on Wj and to 
A a as the correction to w. 

Input Wires. Prepared wires can also be constructed as input wires. An input 
wire for Aliee is a prepared wire for which Alice is given all of the values in the 
representation, namely {(j)f{a)} as well. (One way to imagine this is as though 
Bob simply unveiled all of his values for that wire.) Input wires for Bob are 
defined similarly. 

Virtual Wires. Another convenient concept is that of a virtual wire. A virtual 
wire is a wire (namely a collection of values) constructed from other wire values, 
and in this way distinguished from wires constructed before the outset of the 
protocol. Virtual wires are employed in the same way as original wires; the 
distinction is primarily for explanatory purposes. 
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2.2 Gates 

A multiplication gate is a set of three prepared wires representing secret values 
r, Sj and rs, where r and s are random field elements {cf, [B91]). Other gates 
(e.g. inversion) are described later. 

Like wires, gates exist in two states, prepared and dedicated ^ depending on 
whether all of their wires are prepared or not. Gates are used in a one-time 
fashion: once dedicated, they are never rededicated. The evaluation of gates - 
including addition, multiplication, and multiplicative inversion gates - is de- 
scribed in §3.2. 

2.3 Production and Supply 

A trusted third party will supply some number Ni of input wires for Alice and Ni 
for Bob; some number Np of prepared wires; some number Nm of multiplication 
gates, and some number No of inversion gates. The construction of these values 
requires a good random number generator and simple arithmetic. This collection 
of values comprises the two-party one-time table, with half destined for Alice and 
half destined for Bob. 

The trusted party is thereafter not involved, although it (or another party) 
may later mediate key exchange to enable fair revelation of the final result. 

2.4 Committal 

The one-time table employs a statistically-secure committal scheme. Because 
the trusted party is supplying the one-time table, it builds the committal si- 
multaneously, so that Alice and Bob need not execute a committal protocol. 
One arithmetic-based way to implement this is to commit Alice to a by se- 
lecting a random line y — a ^ bx along with a random u, then supplying Al- 
ice with CommA^(a) = (a^b) and Bob with CommA^(a) = (u,u = a+ bu) (cf. 
[RB89, TW86]). Alice unveils by revealing (a, 6). Bob checks that y — a ^ bx. 
Her chance of successfully cheating is bounded by 1/(|F| — 1) . Other commitment 
schemes are also suitable. 

2.5 Arithmetic: Moduli and Fields 

Because there are only two clients, Alice and Bob, the protocol is not as imme- 
diately tied to particular moduli or fields as may be the case with multiparty 
computation, where field sizes must (for example) exceed the number of parties. 
Instead, the security parameter places a lower bound on the field size used for 
the OTT. Extension fields can be used to accommodate the case where a desired 
prime modulus (or given field) is smaller than the security parameter (more ac- 
curately, 2fc + 4). Simultaneous execution using different prime moduli, using the 
Chinese Remainder Theorem at input and output, can accommodate factored, 
composite moduli. (Note that the multiplicative inverse protocol must still be 
restricted to elements that are guaranteed to be invertible, else information may 
be leaked.) 

3 Unverified Computation 

Input and circuit evaluation consist of turning prepared wires into dedicated 
ones, thereby propagating inputs from the input side of a circuit Cf for f{x,y) 
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to the output side. We shall assume that /(x,y) has been described in terms of 
addition, multiplication, and multiplicative inversion operations over field F ; we 
focus on evaluating individual operations. 

3.1 Input 

When Alice has input value x, she takes the next prepared input wire (for Alice) 
and announces Aa — x — without revealing x and a, of course. Bob provides 
inputs in a similar fashion. Clearly, Aa appears to Bob as a random element 
independent of x, since a is uniformly random (and not otherwise used). 

3.2 Evaluation 

We focus on evaluating addition and multiplication. Multiplicative inverses are 
a straightforward application of multiplication, using a method first described 
in [BBSS], but require additional optional mechanisms. 

Addition Let (x) , (x)) represent the value x on dedicated wire wi, and 

(ip"^{y),ip^(y)) represent the value y on dedicated wire W 2 - We first show how 
to construct a virtual, dedicated wire representing the value z = x ^y. 

In more detail, let a be the core value on wi and b the core value on W 2 ; let 
Aa and Ab be the commonly-known corrections. To add the values held by two 
dedicated wires, Alice and Bob each locally add their shares as follows: 



Wire Value 


Alice 


Bob 


X = aF Aa 


^p^{x) = {Aa, at {a)}) 


= (Aa, at {a}}) 


y-b+Ab 


riy) = {Ab,am}) 


t^{y) = {Ab,at{b)}) 


z = xFy 


{^) = + Ab, atia) + </>f (6)}) 


t^{z) = {Aa + Ab, at {a) + W}) 



Remarks, (1) Even though Alice and Bob may have been committed to the 
values on the input wires, they are not directly committed to the value on virtual 
wire At the moment, there is nothing to prevent them from pretending that 
their values for are different, but this will be taken care of later. (2) Linear 
combinations are easily evaluated by replacing each sum above with the desired 
linear combination. 

Multiplication The crux of the process is to multiply two values x and y to 
obtain a third, 0 , without revealing any of them. Let dedicated wires and 
Wy represent the values x and y through core values a and b with corrections 
Aa and Abj respectively. Alice and Bob select the next available undedicated 
multiplication gate g, which, as explained above, consists of three prepared wires 
UJi, W 2 j and representing the values r, s and rs, respectively. 

Let L interpolate a polynomial through n elements; namely: 

n 

Li{iXi,yi)}i=l..n,x) = _p )yi- 

i=l j^i 3 

Let Interpolate denote a method that interpolates a set of elements (ie. com- 
putes L) and returns the free term if the polynomial is of degree at most t. 
Otherwise, the method throws an exception, aborting the protocol. 

The multiplication protocol is described in Fig. 1. For now, assume that 
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Mult iply-Subr out ine 


1.1. 


A: 


<XA,i -e 4>t(a) -4>f{r) 
13 A, i ^ <Pt(b) - <pf{s) 


1.2. 


A^B: 




1.3. 


B: 


aB,i ^ 4>f (a) - 4>f (r) 
Ps.i ^ 4>i{h) - (f)f{s) 


1.4. 


B ^ A: 




2.1 


A: 


a ^ Interpolate({(^,i, +aB,i)}) 
j3 ^ Interpolate({(^,i,/7A,i + Pb^i)}) 

#(c) ^ 4>t{rs) + 4>t{r){(3 + Ab) + #(«)(« + Aa) 
Ac ^ (a + Aa) {fi + Ab) 


2.2 


B: 


a ^ Interpolate({(^,i, +aB,i)}) 

P ^ Interpolate({(^,i,/7A,i + Pb^i)}) 

pf (c) ^ pf {rs) + pf (r) {p + Ah) + pf (s) (a + Aa) 

Ac ^ (a + Aa) (P + Ab) 



Fig. 1. Two-pass protocol to evaluate a multiplication gate. 



Alice and Bob are honest. Noting that L is linear (that is, Uj + Vi)}^ x) = 

and L{{{H,cui)},x) = c • L{{{H,Ui)},x)), it is 
easy to see that the values a and /? evaluate to the following, by design: 

a = L({(i,i,aA,i + aB,i)},0) 

= HiinAtia) + <l>f - d-f (r))},0) = o - r 

and likewise for /? = 6 — s. As a result, 
c=L({C,,d-f(c) + d-f(c))},0) 

= L{{{i,i,(f>f{r8)+(f)f{r8))},0) + il3 + Ab)L{{{i,i,(f)f{r) + d-f(r))},0) 

+(a + Aa)L({(i,i, d-f (s) + d'f («))}, 0) 

= rs + (/? + Ab)r + (a + Aa)s, 

from which c + Ac — xy can be verified through direct algebra. 

3.3 Output 

In §4, we describe how Alice and Bob verify that each has behaved sufficiently 
properly for them to continue with the final stage, namely the revelation of the 
output, /(x,y). If that verification succeeds, they simply unveil their output 
wires and interpolate the results. That is, for each output 0 on a wire with 
core value a and correction Aa, Alice unveils {(j)f(a)} and Bob unveils (a)}. 
Using these values and the commonly-known value of Aa, they each use BCH 
decoding methods to correct up to t errors, computing (after error correction): 

z = L{{{H,<t>f{a) + d-f (a))},0) + Aa. 



Fairness, Because this is a two-party setting, malicious faults enable one party to 
withold information after it has learned something from the other. The results 
of Cleve apply [Cle86]: with simple adjustments that cost extra rounds, the 
amount of advantage gained by wit holding information can be limited to an 
optimal, inverse polynomial amount. 
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Alternatively, the help of a third party can be invoked. This exchange agent 
can be the original supplier or a new server. To prevent leaking the final value to 
the exchange agent, Alice and Bob modify / so that it accepts an extra random 
input from each and masks the output with that input: 

fix or,yos) = (/(x, y) + r, f(x, y) + s). 

After running the protocol to compute / instead, Alice and Bob provide the 
exchange agent with their complete set of information regarding the final wire, 
a. For Alice, this information comprises 

{ {(ff (a ) , CommA ^ (ff (a)), CommB^ (ff (a)))}. 

When the exchange agent receives the two messages containing this informa- 
tion, it performs the appropriate verification against each other. If no errors 
are found, it forwards each message to the other player. After receiving the for- 
warded message, each player then sends its message directly to the other (to 
avoid falsification on the part of the exchange agent). Each player then finishes 
the protocol as specified earlier (through error correction and interpolation), 
unmasking the result with r or s, respectively. 

4 Lazy Verification 

Notice that no mention of protection against malicious behavior dirtied the pre- 
ceding description. Nor are we about to suggest the insertion of zero- knowledge 
proofs or cut-and-choose methods at each step. Instead, the verification occurs at 
the very end, before the output is revealed. All gates are verified simultaneously. 

This means that Bob can trivially and maliciously add 1 to an intermediate 
wire value, for example, during the computation stage. All he need do is to 
increment each of his values. Our goal is to enable Alice (and Bob) to 
detect such misbehavior just before the final output is revealed, and to abort if 
so. 

Let us refer to the data associated with a given index i e [l..iV] as a channel 
For example, the values and 4>f{x) over all x form part of channel i. 

The verification process is simple: Alice randomly selects k channels I = 
{ii, . . . , ik} Q and announces them. Bob then randomly selects k remain- 
ing channels J = {ji, . . . ,jk} E and announces them. 

Alice and Bob then unveil all values indexed by / U J, throughout the entire 
circuit. For instance, where Alice holds {<^^(a)}, she unveils {(j)f(a) : i £ lU J}. 

Alice and Bob then individually check whether those values are consistent 
with the sums announced on those ehannels in multiplication steps. (This also re- 
quires propagating the values through virtual wires, e,g, where two intermediate 
results were added.) If a party finds any discrepancy whatsoever, it aborts. 

5 Statistical Security 

Because the core values are secret and uniformly random, the correction values 
are uniformly random values, regardless of the ultimate wire values. Because the 
secret wire values are used only once, the correction values reveal nothing about 
the actual values. 

The verification procedure reveals some 2k channels. Because the represen- 
tation on each wire employs a secret random polynomial of degree t, followed 
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Copy-Subroutine 




1.1. 


A: 


(XA,i -e cf>tia) -4>f{r) 


1.2. 


B: 


{a.A,i} 


1.3. 


B: 


aB,i ^ ^ 4>fir) 


1.4. 


A: 




2.1 


A: 


a ^ Interpolate({(^,i, +«B,i)}) 






Ar ^ Aa + a 


2.2 


B: 


a ^ Interpolate({(^,i, +«B,i)}) 






Ar ^ Aa + a 



Fig. 2. Protocol to copy a wire value (fanout 1). 

by a random pair summation, the set of values {((j)f (a) ^ (j>f (a)) : i £ I U J} is 
uniformly distributed over (F x 

If Alice attempts to depart from the protocol, then Bob aborts the protocol 
unless Alice either (1) unveils at least one incorrect value successfully, or (2) 
announces at least N — t incorrect values in some computation step. (If she 
makes fewer changes, then an a value will not interpolate correctly, and Bob 
will abort.) 

Case (1) clearly occurs with probability o{kF^) for any c. In case (2), each 
of Bob^s k random index choices gives a chance of at least k/N > 1/7 to give 
detection. Aliceas chance of success is at most (6/7)^, which is o{kr^) for any c. 

6 Copying and Multiplicative Inverses 

Because a fraction of the channels are revealed whenever a wire is verified, it is 
clear that wires cannot be reused indefinitely for multiple computations. This 
does not place a bound on the famout of a given gate or input during a single 
computation; the sister values “down the line” are verified simultaneously at the 
end. If wires are to be used in later computations, or revealed (for whatever 
reason) midstream, then further techniques are needed to realize unrestricted 
famout. These techniques are described below, for copying and for fast (0(1)“ 
round), direct multiplicative inversion. 

Copying, The one-time table is expanded to include copy gates. A copy gate is a 
set of two or three prepared wires representing the same secret random value r. 
The first wire is designated the input wire and the other one or two are outputs. 
A fanout- 1 gate is essentially a renewal of the hidden randomness used to encode 
values, while a fanout-2 gate is useful for modelling multiple-fanout circuit gates. 

The procedure described in Fig. 2 shows how to execute a copy fanout-1 
operation on some dedicated wire w with core value a and correction Aa. Higher 
fanout is a simple generalization. It is straightforward to show that a = a — r. 
Furthermore, r + Ar = r + Aa + (a — r) = Aa a — x. 

Premature Verifieation, To perform a verification before the final output has 
been reached, choose a slice across the circuit and duplicate each wire that 
crosses it, using fanout-1 copy gates. Perform the verification described in §4 
on the portion of the circuit up to the copy gates. Later verifications operate 
on untouched wires after the copy gates and are fully independent of earlier 
verifications. 
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Multiplicative Inverses, Using methods pioneered in [BBSS], Alice and Bob can 
take the multiplicative inverse of a nonzero intermediate value without revealing 
it. Rather than simulating some arithmetical or logical algorithm for multiplica- 
tive inverses, observe that for any nonzero x and u: = {ux)^^u. 

An inverse gate consists of two wires, each representing u for some random 

invertible u. It is employed to invert an arbitrary nonzero x as follows. First, x is 
copied using a fanout-2 copy gate; one copy is used for all later applications of x, 
while the other is used further in this subprotocol. Second, x is multiplied with 
the first copy of u using the multiplication protocol described above. Third, 
a premature verification is performed, where the slice includes the copy gate. 
Fourth, the product wire for ux is revealed, using the output protocol of §3.3. 
Finally, the value of (ux)^^ is used as a constant multiplier to the second wire 
containing u: each of the values and (j>f{u) is multiplied by and 

Au is set to zero. 
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